Data Stream Median

Numbers keep coming, return the median of numbers at every time a new number added.

http://www.lintcode.com/en/problem/data-stream-median/#

Discussion


难。

关于c++里面的head—— priority_queue,默认是最大堆,top返回的总是最大值,最小对可以重新定义compare函数实现,也可以简单的对输入数组取反,top的时候再取反,这样得到的就是最小值了。

维护一个最大堆,里面保存比当前median小的数,维护一个最小堆,里面保存比当前mdedian大的数,要保证最大堆的大小不超过最小堆,但是又不能少于最小堆两个。不满足这两个条件的时候就将当前median插入该堆,并将另一个堆的top取出作为新的median。

两个heap,一个maxHeap 一个minHeap,maxHeap存array里面数的左半部分,minHeap存右半部分,这样如果array的size是奇数的话,两个heap的size应该差1,median应该是size大的那个heap的top,如果array的size是偶数的话,median是两个heap的top的平均值。然后当新加一个数的时候,把这个数和原来的median相比,如果大于原来的median就insert到minHeap,反之存到maxHeap,然后还要检查两个heap的size是否相差2,如果是的话就需要调整下把size大的heap的top放到另外的heap里

lintcode 和leetcode上关于median的定义是不一样的。

Solution -lintcode


vector<int> medianII(vector<int> &nums) {
        // min_heap stores the larger half seen so far.
        priority_queue<int, vector<int>, greater<int>> min_heap;
        // max_heap stores the smaller half seen so far.
        priority_queue<int, vector<int>, less<int>> max_heap;

        vector<int> ans;
        for (const auto& num : nums) {
            if (max_heap.empty() || num > max_heap.top()) {
                min_heap.emplace(num);
                if (min_heap.size() > max_heap.size() + 1) {
                    max_heap.emplace(min_heap.top());
                    min_heap.pop();
                }
            } else {
                max_heap.emplace(num);
                if (max_heap.size() > min_heap.size()) {
                    min_heap.emplace(max_heap.top());
                    max_heap.pop();
                }
            }

            ans.emplace_back(min_heap.size() == max_heap.size() ?
                             max_heap.top() : min_heap.top());
        }

        return ans;
    }

Solution - leetcode

// Time:  O(nlogn) for total n addNums, O(logn) per addNum, O(1) per findMedian.
// Space: O(n), total space

class MedianFinder {
public:

    // Adds a number into the data structure.
    void addNum(int num) {
        //because min_heap.top is the current median if the total number is odd
        //we try to put num into min_heap first
        if(max_heap.empty() || num > max_heap.top()) {
            min_heap.emplace(num);
            //check balance
            if(min_heap.size() - max_heap.size() > 1) {
                max_heap.emplace(min_heap.top());
                min_heap.pop();
            }
        } else {
            max_heap.emplace(num);
            //check balance
            if(max_heap.size() - min_heap.size() > 0) {
                min_heap.emplace(max_heap.top());
                max_heap.pop();
            }
        }
    }

    // Returns the median of current data stream
    double findMedian() {
        //we make sure min_heap.size() == max_heap.size() or min_heap.size() == max_heap.size()+1
        //so if the total number is odd, median is min_heap.top(), for even, it's the averge of two tops
        if(min_heap.size() == max_heap.size()) {
            return (min_heap.top() + max_heap.top())/2.0;
        } else {
            return min_heap.top();
        }
    }
private:
    priority_queue<int> max_heap;//stores the smaller half
    priority_queue<int, vector<int>, greater<int> > min_heap; //stores the larger part
};

Kamyu104's BST SOLUTION

// BST solution.
class MedianFinder2 {
public:

    // Adds a number into the data structure.
    void addNum(int num) {
        // Balance smaller half and larger half.
        if (max_bst_.empty() || num > *max_bst_.cbegin()) {
            min_bst_.emplace(num);
            if (min_bst_.size() > max_bst_.size() + 1) {
                max_bst_.emplace(*min_bst_.cbegin());
                min_bst_.erase(min_bst_.cbegin());
            }
        } else {
            max_bst_.emplace(num);
            if (max_bst_.size() > min_bst_.size()) {
                min_bst_.emplace(*max_bst_.cbegin());
                max_bst_.erase(max_bst_.cbegin());
            }
        }
    }

    // Returns the median of current data stream
    double findMedian() {
        return min_bst_.size() == max_bst_.size() ?
                   (*max_bst_.cbegin() + *min_bst_.cbegin()) / 2.0 :
                   *min_bst_.cbegin();

    }

private:
   // min_bst_ stores the larger half seen so far.
    multiset<int, less<int>> min_bst_;
    // max_bst_ stores the smaller half seen so far.
    multiset<int, greater<int>> max_bst_;
};

multiset V.S. priority_queue (BST VS heap)

  1. A priority queue only gives you access to one element in sorted order -- i.e., you can get the highest priority item, and when you remove that, you can get the next highest priority, and so on. A priority queue also allows duplicate elements, so it's more like a multiset than a set. [Edit: As @Tadeusz Kopec pointed out, building a heap is also linear on the number of items in the heap, where building a set is O(N log N) unless it's being built from a sequence that's already ordered (in which case it is also linear).]

  2. A set allows you full access in sorted order, so you can, for example, find two elements somewhere in the middle of the set, then traverse in order from one to the other.

Reference

http://blog.csdn.net/tianshuai1111/article/details/7652553

results matching ""

    No results matching ""