Privacy and utility tradeoff

0 comments

Please design a method (algorithm) to realize the privacy-utility tradeoff in data publishing

  • For the given adult dataset, there are two special attributes among 14 attributes, i.e., the “education_num”, and the ” salary”. One application scenario is that this dataset will be published for social research, where the value of attribute ” education_num “, ” salary” of each individual is unknown. To design a set of diagnosis assistance, the researchers from the human resource department are expected to predict the value of ” education_num ” as accurately as possible. However, due to the sensitivity of attributes ” salary”, which may give out employee delicate privacy, the researchers from the human resource department are expected to predict the attribute as inaccurate as possible
    • Considering you are the data publisher, you are required to inject noise (any type of noise you want, not limited to differential noise) to this dataset prior to publishing.
    • The object is clear: the researchers are expected to predict the value of ” education_num ” as accurately as possible (this is utility); however, predict ” salary” as inaccurate as possible (this is privacy).
    • The performance of your method will be evaluated by the tradeoff between privacy and utility. For example, utility/privacy
    • Hint: do not forget data dependency.
  • About the Author

    Follow me


    {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}