External Merge Sort, time complexity analysis

论坛 期权论坛 脚本     
匿名技术用户   2021-1-16 08:07   552   0

The time complexity of Merge Sort is nlogn. How about the external merge sort?


1. One pass external merge sort

step 1. break N data to k groups, each group has N/k data, complexity is N/klog(N/k) * k = N log(N/k)

step 2. k way merge sort. Use min_heap, each time to push the data in the heap takes log(k) time, N data takes Nlog(k),

so total time complexity is Nlog(k) + Nlog(N/k) = NlogN


2. Two pass external merge sort

If k is too large, in merge sort, the content read in the input buffer may be too small. For example, if the data needs to be sorted is 500G, one only have 1G memory. Then k= 500. When doing the merge sort, only 2MB will be read in the input buffer(Looks OK, Ah? But if total data is TB, or the memory is much smaller, then each time it may only read KB data in the main memory). The frequency to access disk may be too high. So we could do two path.

First path, break N to k2 group, do merge of each group, each group has N/k2 data. (k2 = 20, each group has 25G data, and we further break each 25G data to 25 pieces of files)

Second path, do k2 way merge sort, takes Nlogk2 time. So total time for two pass merge sort is Nlog(N/k2) + Nlogk2 = NlogN


Reference:

http://en.wikipedia.org/wiki/External_sorting

http://www.mitbbs.com/article_t/JobHunting/32257009.html


分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:7942463
帖子:1588486
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP