首页 > 试题广场 >

数据分类处理

[编程题]数据分类处理

热度指数：184599 时间限制：C/C++ 1秒，其他语言2秒空间限制：C/C++ 32M，其他语言64M
算法知识视频讲解

$\hspace{15pt}$ 信息社会，有海量的数据需要分析处理，比如公安局分析身份证号码、QQ 用户、手机号码、银行帐号等信息及活动记录。采集输入大数据和分类规则，通过大数据分类处理程序，将大数据分类输出。

$\hspace{15pt}$ 对于给定的分类规则集 $R = \{R_1, R_2, \dots, R_m\}$ ，规范化它，具体地：
$\hspace{23pt}\bullet\,$ 将 $R$ 中的整数按从小到大的顺序重新排序；
$\hspace{23pt}\bullet\,$ 去除 $R$ 中的重复元素；
$\hspace{15pt}$ 记规范化后的分类规则集为 $r = \{r_1, r_2, \dots, r_m\}$ 。

$\hspace{15pt}$ 对于收集到的、由若干个整数组成的数据集 $I$ ，按照下方的要求，使用规范后的分类规则集 $r$ 输出分类后的结果。

$\hspace{23pt}\bullet\,$ 对于第 $i$ 条分类规则 $r_i$ ，如果 $I$ 中存在以 $r_i$ 为连续子串的整数，则该规则集有效；进一步地，你需要输出有多少条数据符合该规则，以及这些数据在 $I$ 中的位置、数据本身。

$\hspace{15pt}$ 子串为从原字符串中，连续的选择一段字符（可以全选、可以不选）得到的新字符串。对应本题中，你需要将整数看作是数字字符串。

输入描述:

第一行先输入一个整数  代表数据集  中的数据条数。随后，在同一行输出  个整数  代表数据。
第二行先输入一个整数  代表分类规则集  中的规则条数。随后，在同一行输出  个整数  代表规则。

输出描述:

在一行上：
先输出一个整数  ，代表一共需要输出的数字个数。简单地说，这个数字为下文中你输出数量的个数统计。
随后，对于规范后的每一条规则，如果其有效：先输出这条规则本身，随后输出一个整数  ，代表符合该规则的数据条数；随后输出  个二元组  ，代表符合这条规则的数据在  中的位置、数据本身。其中，位置从  开始计数。如果其无效，则跳过这条规则。

示例1

输入

15 123 456 786 453 46 7 5 3 665 453456 745 456 786 453 123
5 6 3 6 3 0

输出

30 3 6 0 123 3 453 7 3 9 453456 13 453 14 123 6 7 1 456 2 786 4 46 8 665 9 453456 11 456 12 786

说明

在这组样例中，给定的原始数据集为  ，给定的原始规则集为  。
规范化后的规则集为  。
随后，对  进行分类处理：
对于规则  ，由于  中不存在以  为连续子串的数据，因此该规则无效，跳过；
对于规则  ， 中以  为连续子串的数据有： 、 、 、 、 、，因此该规则有效。根据输出描述，先输出规则本身  、随后输出符合要求的条数  、随后输出符合要求的数据在  中的位置和整数本身  ；
对于规则  ， 中以  为连续子串的数据有： 、 、 、 、 、 、 ，因此该规则有效。根据输出描述，先输出规则本身  、随后输出符合要求的条数  、随后输出符合要求的数据在  中的位置和整数本身。
不要忘了在输出开始的整数  ，在这个样例中，一共输出了  个数字，所以  。

备注:

本题由牛客重构过题面，您可能想要阅读原始题面，我们一并附于此处。
【以下为原始题面】
从R依次中取出R<i>，对I进行处理，找到满足条件的I： 
I整数对应的数字需要连续包含R<i>对应的数字。比如R<i>为23，I为231，那么I包含了R<i>，条件满足 。 
按R<i>从小到大的顺序:
(1)先输出R<i>； 
(2)再输出满足条件的I的个数； 
(3)然后输出满足条件的I在I序列中的位置索引(从0开始)； 
(4)最后再输出I。 
附加条件： 
(1)R<i>需要从小到大排序。相同的R<i>只需要输出索引小的以及满足条件的I，索引大的需要过滤掉 
(2)如果没有满足条件的I，对应的R<i>不用输出 
(3)最后需要在输出序列的第一个整数位置记录后续整数序列的个数(不包含“个数”本身)
 
序列I：15,123,456,786,453,46,7,5,3,665,453456,745,456,786,453,123（第一个15表明后续有15个整数） 
序列R：5,6,3,6,3,0（第一个5表明后续有5个整数） 
输出：30, 3,6,0,123,3,453,7,3,9,453456,13,453,14,123,6,7,1,456,2,786,4,46,8,665,9,453456,11,456,12,786
说明：
30----后续有30个整数
3----从小到大排序，第一个R<i>为0，但没有满足条件的I，不输出0，而下一个R<i>是3
6--- 存在6个包含3的I 
0--- 123所在的原序号为0 
123--- 123包含3，满足条件

算法知识视频讲解

Python

牛客590711274号

while True:#别的大佬的都比较短=、=
    try:
        b=[]
        final=""
        I=input().split()#I 以空格区分
        R=input().split()#R以空格区分
        In = I[0]#记录I个数
        del I[0]#记录完删除
        Rn=R[0]#记录R个数
        del R[0]
        R = list(map(int,R))#转为int
        R = set(R)#去除重复
        R = list(R)
        R.sort()#排序从小到大
        R = list(map(str,R))#转为string
        for i in range (len(R)):#遍历每一个R[i]
            found = False
            a=[]#每一个R【i】查找完重置，也可以放在一个里面继续往后无区别，看着舒服。
            str1=""
            for ii in range (len(I)):#R[i]查找是否在每一个I【ii】里
                if R[i] in I[ii]:
                    found = True#确认有找到
                    str2 = str(ii)+ " "+I[ii]#位置+“ ”+数据的格式添加
                    a.append(str2)
            if found == True:
                a.insert(0, str(len(a)))#前端添加长度，也就是找到几个
                a.insert(0,str(R[i]))#前端添加R[i]
                str1 = ' '.join(a)
                b.append(str1)#转存到b里，可以不用这个为了测试看的舒服
        final = ' '.join(b)#合并b
        count = final.split()#以空格区分看看一共有多少个元素
        num = len(count)
        final = str(num) + " " + final#打印总元素个数和后续结果
        print(final)
    except:
        break

发表于 2021-07-05 14:22:27 回复(0)

Python

张浩

a=input().split()[1:]
b=map(int,input().split()[1:])
b=sorted(set(b))
res=[]
for i in b:
    count =0
    x=[]
    for j in range(0,len(a)):
        if str(i) in a[j]:
            count +=1
            x.append(str(j))
            x.append(a[j])
    if count!=0:
        res.append(str(i))
        res.append(str(count))
        res.extend(x)
res.insert(0, str(len(res)))
print(' '.join(res))

发表于 2021-05-11 21:49:28 回复(0)

Python

牛客234648509号

求助 , 自测输入输出完全符合要求,但是提交后通过率都是0% , 快要被弄疯了;

def func(tmp_I,tmp_R):

    N_I,list_I = int(tmp_I[0]) , tmp_I[1:] 
    N_R , list_R = int(tmp_R[0]) , sorted(set(tmp_R[1:])) 
    out_put_list = [] 

    for r in list_R:
        # 判断 r 是否满足条件;
        tmp = [(j,list_I[j]) for j in range(N_I) if r in list_I[j] ] 
        n_tmp = len(tmp) 
        if n_tmp > 0 : 
            # 条件满足;
            # 先输出r;
            out_put_list.append(str(r))
            # 再输出满足条件的i的个数n_tmp;
            out_put_list.append(str(n_tmp))
            # 再输出满足条件的i在I序列中的位置索引;以及i;
            for index,i in tmp : 
                out_put_list.append(str(index))
                out_put_list.append(str(i))

                # 最后统计输出的总长度,并格式化输出;
    out_put_list = [str(len(out_put_list))] + out_put_list
    out_put = ' '.join(out_put_list)

    return out_put 


while 1: 
    try:
        # 获取输入;按要求整理; 
        tmp_I = input().split() #15 123 456 786 453 46 7 5 3 665 453456 745 456 786 453 123
        tmp_R = input().split() #5 6 3 6 3 0
        out_put = func(tmp_I,tmp_R)
        print(out_put) 

    except:
        break

发表于 2021-04-19 09:17:54 回复(0)

Python

潘光祥

while True:
try:
I=list(map(str, input().split()[1:]))
R=list(map(str,sorted(set(map(int, input().split()[1:])))))
d={}
li=[]
ln=[]
for i in R:
for j in I:
if i in j:
if i not in d: d[i]=1
else: d[i]+=1
for k,v in d.items(): li.append([k,v])
for i in li:
for j in range(len(I)):
if i[0] in I[j]: i.append(j),i.append(I[j])
for i in range(len(li)): ln+=li[i]
ln.insert(0, len(ln))
ln=list(map(str, ln))
print(' '.join(ln))
except:
break

编辑于 2021-03-10 23:39:42 回复(0)

Python

ks_wang

哪位大神知道这个答案一定要加try:和 while True:？不加的话就会报通过率为0？

try:

while True:
def quchongpaixu(l1=[]):
tempSet = set(l1)
tempList = list(tempSet)
tempList.sort(reverse=False)
return tempList

listI = list(map(str, input().split()))
listR = list(map(int, input().split()))
# listI = [15 123 456 786 453 46 7 5 3 665 453456 745 456 786 453 123]
# listR = [5 6 3 6 3 0]
iNumber = listI[0]
listI.pop(0)
rNumber = listR[0]
listR.pop(0)
listR = quchongpaixu(l1=listR)
# print(listR)
# 循环遍历R序列，带着遍历出来的值在I里面寻找，如果寻找成功则记录起来，如果寻找失败，则忽略不计。
resultList = []
for tempR in listR:
# tempr = 3
tempR = str(tempR)
tempResultList = []
for i, tempI in enumerate(listI):
# tempI = 123
if tempR in tempI:
tempResultList.append(i)
tempResultList.append(tempI)
if len(tempResultList)>0:
tempResultList.insert(0, str(int(len(tempResultList)/2)))
tempResultList.insert(0, tempR)
resultList.extend(tempResultList)
# print(resultList)
resultList.insert(0, len(resultList))
resultStr = ''
for tempStr in resultList:
resultStr = '{} {}'.format(resultStr, tempStr)
print(resultStr.strip())

# 12 4 3 1 4598 3 6047 6 7402 26 1 5 11269
# 12 4 3 1 4598 3 6047 6 7402 26 1 5 11269
except:
pass

发表于 2021-01-15 22:54:24 回复(0)

Python

大溪地渔民

while True:
    try:
        li, lr = input().split(), list(map(int, input().split()))
        lr.pop(0)
        li.pop(0)
        lr = list(map(str, sorted(list(set(lr)))))
        output = []
        for n in lr:
            temp = [(j, li[j]) for j in range(len(li)) if n in li[j]]
            if temp:
                output.extend([n, str(len(temp))])
                for i, v in temp:
                    output.extend([str(i), v])
        print(str(len(output)) + " " + " ".join(output))
    except:
        break

发表于 2020-12-14 22:41:09 回复(0)

Python

JeremyStar

while True:
    try:
        I = input().strip().split(' ')[1:]
        R = input().strip().split(' ')[1:]  # R=['6','3','6','3','0']
        # 此处排序一定要注意是将字符对应的整型数排序，而不是单纯靠字符排序，
        # 避免出现'4'比'26'大的情况
        R = sorted(set(R), key=int)  # R = ['0','3','6']
        rm = []  # R列表中应该移除的值
        for R_i in R:
            flag = 0  # 对于R中的某个元素R_i，假设列表I中的每个元素都不包含R_i
            for I_i in I:
                if R_i in I_i:
                    flag = 1
            if flag == 0:  # 说明R_i应该从R中移除
                rm.append(R_i)
        for rm_i in rm:
            R.remove(rm_i)

        # 至此，列表R已经处理完毕，可以开始与I的匹配运算，R = ['3','6']
        L = []
        for R_i in R:
            L.append(R_i)
            count = 0
            index_value = []
            # 此处一定不能直接迭代I中的元素，即for I_i in I，因为循环
            # 内部需要使用到列表元素的角标，若元素ele出现多次，靠方法
            # L.index(ele)只能返回ele第一次出现时的角标，而不是ele在
            # 列表L中的实际角标位置。
            for index in range(len(I)):
                if R_i in I[index]:
                    count += 1
                    index_value.extend([str(index), I[index]])
            L.append(str(count))
            L.extend(index_value)
        # 至此，3和6的相关输出已经全部在列表L中
        L_len = str(len(L))
        L.insert(0, L_len)
        print(' '.join(L))
    except:
        break

发表于 2020-12-02 10:50:18 回复(0)

Python

牛客517903384号

提交的代码，本地能过，它给的用例就过不了。

系统说不通过的用例：

24 7907 610 4359 55 812 3002 10706 2470 8332 8573 3840 8105 9213 10159 11882 6517 7357 6398 4586 215 3420 4927 7159 9414

10 85 122 46 55 110 47 77 119 50 58

本地输出：16 47 1 7 2470 55 1 3 55 58 1 18 4586 85 1 9 8573

代码————————————————————————————————

def findnum(num,findlist):
find_num=0
index_list=[]
for i in range(0,len(findlist)):
tempdict={}
if num in findlist[i]:
find_num+=1
tempdict[str(i)]=findlist[i]
index_list.append(tempdict)
return str(find_num),index_list


if __name__ == "__main__":
input1=input()
input2=input()

list1=input1.split(" ")
list2=input2.split(" ")

list1=list1[1:]
list2=list2[1:]

list2=sorted(list2)

total=""
totaldict={}
for num in list2:
if num not in totaldict:
totaldict[num]="1"
else:
continue
tempnum,templist=findnum(num, list1)
if len(templist)!=0:
total+=num+" "+tempnum+" "
for tempdict in templist:
total+=list(tempdict.keys())[0]+" "+list(tempdict.values())[0]+" "
total=total[:-1]
totallist=total.split(" ")
total=str(len(totallist))+" "+total
print(total)

发表于 2020-11-03 20:34:50 回复(0)

Python

佐之剑

def info_make():
    i_info = input().split()[1:]
    r_temp = list(set(map(int, input().split()[1:])))
    r_temp.sort()
    r_info = [str(x) for x in r_temp]
    result = []
    for guize in r_info:
        temp = []
        for index, i in enumerate(i_info):
            if guize in i:
                temp.append(index)
                temp.append(i)
        if len(temp) != 0:
            result.append(guize)
            result.append(len(temp) // 2)
            result.extend(temp)

    print(len(result), *result)


if __name__ == '__main__':
    while True:
        try:
            info_make()
        except EOFError:
            break

发表于 2020-10-19 21:47:41 回复(0)

Python

公子i吃鱼

while True:
    try:
        I = input().split()[1:]
        R = map(int, input().split()[1:])
        R = map(str, sorted(set(R)))
        out = [0]
        for r in R:
            message = [0]
            for index, value in enumerate(I):
                if r in value:
                    message.extend([index, int(value)])
                    message[0] += 1
            if message[0] != 0:
                out.append(int(r))
                out.extend(message)
        out[0] = len(out) - 1
        print(' '.join(map(str, out)))
    except:
        break

发表于 2020-09-26 14:50:46 回复(0)

Python

只爱一个秋

当我正序输出全部字符串时：

print(output.rstrip()）

答案如下：

当我倒序输出全部字符时，

print(output.rstrip()[::-1])

答案如下：

这是什么原因？根据倒叙输出，可以推测出我的答案是正确的，但是只要一正序输出，就变成空了。着什么情况？牛客网有什么毛病？

编辑于 2020-09-14 12:13:32 回复(1)

Python

敢梦敢当

Python 运行时间：16ms，附注释

def find(Rx, I):
    Rx = str(Rx)
    count = 0
    resLst = []
    for i in range(len(I)): # 遍历I中元素
        if Rx in I[i]:
            count += 1
            resLst.append(i) # 满足条件的I<j>在I序列中的位置索引
            resLst.append(I[i]) # 满足条件的I<j>的元素
    if count != 0:
        return Rx, count, resLst
    else:
        return '', '', ''

while True:
    try:
        II = input().split()
        lenI = int(II[0]) # 序列I的长度
        I = II[1:] # 序列I的元素，元素为字符串形式
        RR = list(map(int, input().split()))
        lenR = RR[0]
        R = sorted(set(RR[1:]), reverse=False) # 去重，排序
        # 从R依次中取出R<i>，对I进行处理，找到满足条件的I<j>
        resRi = [] # 满足条件的R<i>
        resCount = [] # 满足条件的I<j>的个数
        resIndexContent = [] # 满足条件的I<j>在I序列中的位置索引(从0开始) 与 满足条件的I<j>的元素
        num = 0
        for Rx in R:
            rx, count, resLst = find(Rx, I)
            if rx: # 如果rx非空
                resRi.append(rx)
                resCount.append(count)
                resIndexContent.append(resLst)
                num += (1 + 1 + len(resLst))
        RES = str(num)
        for i in range(len(resRi)):
            RES = RES + ' ' + str(resRi[i]) + ' ' + str(resCount[i]) + ' ' + (' '.join(str(ii) for ii in resIndexContent[i]))
        print(RES)
    except:
        break

编辑于 2020-08-26 17:24:16 回复(0)

Python

sadasd11

while True:
    try:
        i_str = input()
        r_str = input()
        i_len, i_list = int(i_str.split()[0]), i_str.split()[1:]
        r_len, r_nums = int(r_str.split()[0]), list(map(int, r_str.split()[1:]))
        r_nums = sorted(set(r_nums))
        r_dict = {}
        ans_sum = 0
        ans_strings = []
        for r_num in r_nums:
            for i in range(i_len):
                if str(r_num) in i_list[i]:
                    if str(r_num) not in r_dict.keys():
                        r_dict[str(r_num)] = [(str(i), i_list[i])]
                    else:
                        r_dict[str(r_num)].append((str(i), i_list[i]))
            if str(r_num) in r_dict.keys():
                ans_sum += 2 + len(r_dict[str(r_num)])*2
                ans_strings.append(str(r_num))
                ans_strings.append(str(len(r_dict[str(r_num)])))
                for (index, num) in r_dict[str(r_num)]:
                    ans_strings.append(index)
                    ans_strings.append(num)
        ans= str(ans_sum) + ' ' + ' '.join(ans_strings)
        print(ans)
    except:
        break

发表于 2020-08-19 09:37:12 回复(0)

Python

牛客JieSor

while True:
    try:
        list1 = [x for x in input().split(' ')]
        list2 = [int(y) for y in input().split(' ')]

        num1 = int(list1[0])
        num2 = list2[0]

        dict1 = {}
        # 对序列R：降重——排序
        l = sorted(set(list2[1:]))  
        # 找到符合条件的I[],并建立字典
        for i in range(len(l)):
            list3 = []
            for j in range(1,num1+1):
                if str(l[i]) in list1[j]:
                    list3.append(str(j-1) + ' ' + list1[j])
                    dict1[str(l[i])] = list3
                else:
                    continue  
        s = ''
        for i in dict1:
            s = s + i + ' ' + str(len(dict1[i])) + ' '
            for j in dict1[i]:
                s += j + ' '
        num = len([x for x in s.split(' ')]) - 1
        print(str(num) + ' ',end='')
        print(s[:-1])
        
    except:
        break

在本地编译器可以，但是一提交，就显示输出为空：

您的代码已保存

答案错误:您提交的程序没有通过所有的测试用例点击对比用例标准输出与你的输出
case通过率为0.00%

用例:
24 7907 610 4359 55 812 3002 10706 2470 8332 8573 3840 8105 9213 10159 11882 6517 7357 6398 4586 215 3420 4927 7159 9414
10 85 122 46 55 110 47 77 119 50 58

对应输出应该为:

16 47 1 7 2470 55 1 3 55 58 1 18 4586 85 1 9 8573

你的输出为:
空.请检查一下你的代码，有没有循环输入处理多个case.点击查看如何处理多个case

发表于 2020-07-21 11:05:12 回复(0)

Python

牛客723080320号

def aa(i, nr):
l = []
c = 0
li = []
for j in range(len(ni)):
if str(i) in str(nr[j]):
c += 1
li.append(j)
li.append(nr[j])
if c != 0 :
l.append(i)
l.append(c)
l = l + li
return l

while True:
try:
li = input().split()
lr = input().split()
ni = [int(i) for i in li[1:]]
nr = [int(i) for i in lr[1:]]
s = set(nr)
nr = list(s)
nr.sort()

l = []
for i in nr:
l = l + aa(i, ni)
res = ''+str(len(l))+" "
for i in l:
res += str(i)
res += " "
print (res)
except:
break

发表于 2020-07-11 12:50:18 回复(0)

Python

从8开始倒车

难点有2：

1、容易忽略输入时的手误，比如空格之类的。需求用strip()先清理干净。

2、排序时，如果没有转换成数字类型，你就会发现sorted方法居然失效了。

原因：不转换成数字，排序按照字符首字母的ascii码进行排序，那基本上‘2’会大于‘16’，这是不符合我们预期的。

发表于 2020-06-18 22:55:44 回复(0)

Python

清水煮稀饭

题不难，就是R读入后split (' ')结果中会多一个双引号，导致需要过滤才能喂入下一个函数处理。

def parseIR(I, R):
  res = []
  for rule in R:
    if not rule.isdigit(): continue
    cnt = 0
    numIndex = []
    for index in range(len(I)):
      if rule in I[index]:
        cnt += 1
        numIndex.append([index, I[index]])
    if cnt>0:
      res.append([rule, cnt, numIndex])
  return res

def printRes(res):
  if len(res)==0:
    print()
  cnt = len(res)*2
  for row in res:
    cnt += len(row[2])*2
  rows = str(cnt)+' '
  for row in res:
    rowStr = '{} {} '.format(row[0], row[1])
    for pair in row[2]:
      rowStr += '{} {} '.format(pair[0], pair[1])
    rows += rowStr
  print(rows[:-1])
    
while True:
  try:
    i = input()
    if i:
      I = i.split(" ")[1:]
      r = input()
      R = r.split(" ")[1:]
      R = list(set(R))
      R = [int(x) for x in R if x.isdigit()]
      R.sort()
      R = [str(x) for x in R]
      res = parseIR(I, R)
      printRes(res)
    else: break
  except: break

发表于 2020-06-06 03:38:08 回复(0)

Python

废材大叔_不学不会不练

while True:
    try:
        I=list(input().split(' '))
        R=list(map(int,input().split(' ')))        
        Rs=list(set(R[1:]))
        I=I[1:]
        Rs.sort()
        
        Rs=[str(i) for i in Rs]
        string=[]
        for i in range(len(Rs)):
            temp=[];count=0
            for j in range(len(I)):
                if Rs[i] in I[j]: 
                    temp.append(j)
                    temp.append(I[j])
                    count=count+1
            if count!=0:
                string.append(Rs[i])
                string.append(count)
                string=string+temp        
        num=[len(string)]
        num=num+string
        out=' '.join(map(str,num))
        print(out)
    except:
        break

求大神们帮忙看一下，样例能通过，但是保存调试的时候不通过，提示“空.请检查一下你的代码，有没有循环输入处理多个case”，但是我用了 while True： try： except：break 模块呀，并且我把不通过的样例单独调试是可以通过的，问题出在哪里啊？

发表于 2020-05-18 17:50:39 回复(1)

提交观点

问题信息

模拟哈希排序

来自：华为机试编程模拟题8

难度：

30条回答 2432收藏 45701浏览

通过挑战的用户

查看代码

牛客78170...

2022-11-09 10:09:57
黄启亮

2022-10-01 08:01:08
影殇20190...

2022-09-18 14:38:38
牛客67672...

2022-09-17 01:48:34
（＾Ｏ＾）

2022-09-16 11:17:54

数据分类处理

输入描述:

输出描述:

输入

输出

说明

备注:

问题信息

热门推荐

通过挑战的用户

相关试题