De-Identification of Health Data under Efficient Recommendation using Map Reduce
Keywords:
Big Data; Access Control;Privacy-preserving Policy; De-identification policiesAbstract
Many knowledge homeowners square measure needed to unleash the information during a sort of world application,
since it's of great importance to discover valuable data keep behind the information. However, prevailing re-identification attacks
on the AOL and ADULTS knowledge sets have shown that publishing such data directly might pose huge threats to the individual
privacy. Thus, it's imperative to resolve every kind of re-identification risks by recommending effective de-identification policies
to ensure each privacy and utility of the information. De-identification policies is one amongst the models which we will need to
succeed such needs, however, the quantity of de-identification policies is exponentially massive thanks to the broad domain of
quasi-identifier attributes. To manage the trade off between knowledge utility and knowledge privacy, skyline computation will
be needed to choose such policies, however it's nevertheless difficult for economical skyline process over sizable amount of
policies. During this paper, we tend to propose one parallel algorithmic rule known as SKY-FILTER-MR, that relies on Map
scale back to beat this challenge by computing skylines over massive scale de-identification policies that's drawn by bit-strings.
For improving the performance, a completely unique approximate skyline computation theme was projected to prune unqualified
policies exploitation through the domination relationship. With approximate skyline, the facility of filtering within the policy area
generation stage was greatly strong to effectively decrease the value of skyline computation over various policies. In depth
experiments over each real world and artificial datasets demonstrate that our projected SKY-FILTER-MR algorithmic rule well
outperforms the baseline approach by up to fourfold faster within the best case, that indicates sensible quantifiability over massive
policy sets.