A Study on Privacy-Preserving
Data Mining on the Web Content Based Internet Privacy
S.Parthiban,
sparthibansoft@gmail.com.
Abstract
Privacy-preserving data mining (PPDM) is one of the
newest trends in privacy and security research. It is driven by one of the
major policy issues of the information era - the right to privacy. This
chapter describes the foundations for further research in PPDM on the Web. In
particular, we describe the problems we face in defining what information is
private in data mining. We then describe the basis of PPDM including the
historical roots, a discussion on how privacy can be violated in data mining,
and the definition of privacy preservation in data mining based on users'
personal information and information concerning their collective activity.
Subsequently, we introduce a taxonomy of the existing PPDM techniques and a
discussion on how these techniques are applicable to Web-based applications.
Finally, we suggest some privacy requirements that are related to industrial
initiatives, and point to some technical challenges as future research trends
in PPDM on the Web.
Keywords
Privacy-Preserving Data Mining; Online Privacy;
Privacy Rights; Privacy Regulations; Web-based applications; Internet Privacy,
Internet Security; Internet Standards; E-Commerce Privacy; Database
Requirements analysis; Data Security.
INTRODUCTION
Analyzing what right to privacy means is a fraut with
problems, such as the exact definition of privacy, whether it constitutes a
fundamental right, and whether people are and/or should be concerned with it.
Several definitions of privacy have been given, and they vary according to
context, culture, and environment. For instance, in a seminal paper, Warren
& Brandeis (1890) defined privacy as “the right to be alone.” Later on,
Westin (1967) defined privacy as “the desire of people to choose freely under
what circumstances and to what extent they will expose themselves, their
attitude, and their behavior to others”. Schoeman (1984) defined privacy as
“the right to determine what (personal) information is communicated to others”
or “the control an individual has over information about himself or herself.”
More recently, Garfinkel (2001) stated that “privacy is about self-possession,
autonomy, and integrity.” On the other hand, Rosenberg (2000) argues that
privacy may not be a right after all but a taste: “If privacy is in the end a
matter of individual taste, then seeking a moral foundation for it ¾ beyond its role in making social institutions
possible that we happen to prize ¾ will be no more fruitful than seeking a moral
foundation for the taste for truffles.”
The above definitions suggest
that, in general, privacy is viewed as a social and cultural concept. However,
with the ubiquity of computers and the emergence of the Web, privacy has also
become a digital problem. With the Web revolution and the emergence of data
mining, privacy concerns have posed technical challenges fundamentally
different from those that occurred before the information era. In the
information technology era, privacy refers to the right of users to conceal
their personal information and have some degree of control over the use of any
personal information disclosed to others (Cockcroft & Clutterbuck, 2001).
In the context of data mining,
the definition of privacy preservation is still unclear, and there is very
little literature related to this topic. A notable exception is the work
presented in (Clifton et al., 2002), in which PPDM is defined as “getting valid
data mining results without learning the underlying data values.” However, at
this point, each existing PPDM technique has its own privacy definition. Our
primary concern about PPDM is that mining algorithms are analyzed for the side
e effects they incur in data privacy. We define PPDM as the dual goal of
meeting privacy requirements and providing valid data mining results.
THE BASIS OF
PRIVACY-PRESERVING DATA MINING
Historical Roots
The debate on PPDM has received
special attention as data mining has been widely adopted by public and private
organizations. We have witnessed three major landmarks that characterize the
progress and success of this new research area: the conceptive landmark,
the deployment landmark, and the prospective landmark. We
describe these landmarks as follows:
The
Conceptive landmark characterizes the
period in which central figures in the community, such as O'Leary (1991, 1995),
Piatetsky-Shapiro (1995), and others (Klösgen, 1995; Clifton & Marks,
1996), investigated the success of knowledge discovery and some of the
important areas where it can conflict with privacy concerns. The key finding
was that knowledge discovery can open new threats to informational privacy and
information security if not done or used properly. The Deployment landmark
is the current period in which an increasing number of PPDM techniques have
been developed and have been published in refereed conferences. The information
available today is spread over countless papers and conference proceedings. The
results achieved in the last years are promising and suggest that PPDM will
achieve the goals that have been set for it.
The Prospective landmark is a new period in which directed efforts toward standardization
occur. At this stage, there is no consent about what privacy preservation means
in data mining. In addition, there is no consensus on privacy principles,
policies, and requirements as a foundation for the development and deployment
of new PPDM techniques. The excessive number of techniques is leading to
confusion among developers, practitioners, and others interested in this
technology. One of the most important challenges in PPDM now is to establish
the groundwork for further research and development in this area.
Privacy Violation
in Data Mining
Understanding privacy in data mining requires
understanding how privacy can be violated and the possible means for preventing
privacy violation. In general, one major factor contributes to privacy violation
in data mining: the misuse of data.
Users' privacy can be violated
in different ways and with different intentions. Although data mining can be
extremely valuable in many applications (e.g., business, medical analysis,
etc), it can also, in the absence of adequate safeguards, violate informational
privacy. Privacy can be violated if personal data are used for other purposes
subsequent to the original transaction between an individual and an
organization when the information was collected (Culnan, 1993).
One of the sources of privacy
violation is called data magnets (Rezgui et al., 2003). Data magnets are
techniques and tools used to collect personal data. Examples of data magnets
include explicitly collecting information through on-line registration,
identifying users through IP addresses, software downloads that require
registration, and indirectly collecting information for secondary usage. In
many cases, users may or may not be aware that information is being collected
or do not know how that information is collected. In particular, collected
personal data can be used for secondary usage largely beyond the users' control
and privacy laws. This scenario has led to an uncontrollable privacy violation
not because of data mining itself, but fundamentally because of the misuse of
data.
Defining Privacy
for Data Mining
In general, privacy preservation
occurs in two major dimensions: users' personal information and information
concerning their collective activity. We refer to the former as individual
privacy preservation and the latter as collective privacy preservation, which
is related to corporate privacy in (Clifton et al., 2002).
·
Individual
privacy preservation: The primary
goal of data privacy is the protection of personally identifiable information.
In general, information is considered personally identifiable if it can be
linked, directly or indirectly, to an individual person. Thus, when personal
data are subjected to mining, the attribute values associated with individuals
are private and must be protected from disclosure. Miners are then able to
learn from global models rather than from the characteristics of a particular
individual.
·
Collective
privacy preservation: Protecting
personal data may not be enough. Sometimes, we may need to protect against
learning sensitive knowledge representing the activities of a group. We refer
to the protection of sensitive knowledge as collective privacy preservation.
The goal here is quite similar to that one for statistical databases, in which
security control mechanisms provide aggregate information about groups
(population) and, at the same time, prevent disclosure of confidential
information about individuals. However, unlike as is the case for statistical
databases, another objective of collective privacy preservation is to protect
sensitive knowledge that can provide competitive advantage in the business
world.
In the case of collective privacy preservation,
organizations have to cope with some interesting conflicts. For instance, when
personal information undergoes analysis processes that produce new facts about
users' shopping patterns, hobbies, or preferences, these facts could be used in
recommender systems to predict or affect their future shopping patterns. In
general, this scenario is beneficial to both users and organizations. However,
when organizations share data in a collaborative project, the goal is not only
to protect personally identifiable information but also sensitive knowledge
represented by some strategic patterns.
Characterizing
Scenarios of Privacy Preservation on the Web
In this section, we describe two
real-life motivating examples in which PPDM poses different constraints:
·
Scenario 1: Suppose we have a server and many clients in which
each client has a set of sold items (e.g., books, movies, etc.). The clients
want the server to gather statistical information about associations among
items in order to provide recommendations to the clients. However, the clients
do not want the server to know some strategic patterns (also called sensitive
association rules). In this context, the clients represent companies and the
server is a recommendation system for an e-commerce application, for example,
fruit of the clients collaboration. In the absence of rating, which is used in
collaborative filtering for automatic recommendation building, association
rules can be effectively used to build models for on-line recommendation. When
a client sends its frequent itemsets or association rules to the server, it
must protect the sensitive itemsets according to some specific policies. The
server then gathers statistical information from the non-sensitive itemsets and
recovers from them the actual associations. How can these companies benefit
from such collaboration by sharing association rules while preserving some
sensitive association rules?
·
Scenario 2: Two organizations, an Internet marketing company and
an on-line retail company, have datasets with different attributes for a common
set of individuals. These organizations decide to share their data for
clustering to and the optimal customer targets so as to maximize return on
investments. How can these organizations learn about their clusters using each
other's data without learning anything about the attribute values of each
other?
Note
that the above scenarios describe different privacy preservation problems. Each
scenario poses a set of challenges. For instance, scenario 1 is a typical
example of collective privacy preservation, while scenario 2 refers to
individual's privacy preservation.
A TAXONOMY OF
EXISTING PPDM TECHNIQUES
In this section, we classify the
existing PPDM techniques in the literature into four major categories: data
partitioning, data modification, data restriction, and data ownership as can be
seen in Figure 1.
Data
Partitioning Techniques
Data partitioning techniques
have been applied to some scenarios in which the databases available for mining
are distributed across a number of sites, with each site only willing to share
data mining results, not the source data. In these cases, the data are
distributed either horizontally or vertically. In a horizontal partition,
different entities are described with the same schema in all partitions, while
in a vertical partition the attributes of the same entities are split across
the partitions. The existing solutions can be classified into
Cryptography-Based Techniques and Generative-Based Techniques.
·
Cryptography-Based
Techniques: In the context of PPDM
over distributed data, cryptography-based techniques have been developed to
solve problem of the following nature: two or more parties want to conduct a
computation based on their private inputs. The issue here is how to conduct
such a computation so that no party knows anything except its own input and the
results. This problem is referred to as the Secure Multi-Party Computation
(SMC) problem (Goldreich, Micali, & Wigderson, 1987). The technique
proposed in (Lindell & Pinkas, 2000) address privacy-preserving
classification, while the techniques proposed in (Kantarcioğlu & Clifton,
2002; Vaidya & Clifton, 2002) address privacy-preserving association rule
mining, and the technique in (Vaidya & Clifton, 2003) addresses
privacy-preserving clustering.
·
Generative-Based
Techniques: These techniques are
designed to perform distributed mining tasks. In this approach, each party
shares just a small portion of its local model which is used to construct the
global model. The existing solutions are built over horizontally partitioned
data. The solution presented in (Veloso et al., 2003) addresses privacy-preserving
frequent itemsets in distributed databases, whereas the solution in (Meregu
& Ghosh, 2003) addresses privacy-preserving distributed clustering using
generative models.
Data Modification
Techniques
Data modification techniques
modify the original values of a database that needs to be shared, and in doing
so, privacy preservation is ensured. The transformed database is made available
for mining and must meet privacy requirements without losing the benefit of
mining. In general, data modification techniques aim at finding an appropriate
balance between privacy preservation and knowledge disclosure. Methods for data
modification include noise addition techniques and space transformation
techniques.
·
Noise Addition
Techniques: The idea behind noise
addition techniques for PPDM is that some noise (e.g., information not present
in a particular tuple or transaction) is added to the original data to prevent
the identification of confidential information relating to a particular
individual. In other cases, noise is added to confidential attributes by
randomly shuffling the attribute values to prevent the discovery of some
patterns that are not supposed to be discovered. We categorize noise addition
techniques into three groups: (1) data swapping techniques that interchange the
values of individual records in a database (Estivill-Castro & Brankovic,
1999); (2) data distortion techniques that perturb the data to preserve
privacy, and the distorted data maintain the general distribution of the
original data (Agrawal & Srikant, 2000); and (3) data randomization
techniques which allow one to perform the discovery of general patterns in a
database with error bound, while protecting individual values. Like data
swapping and data distortion techniques, randomization techniques are designed
to find a good compromise between privacy protection and knowledge discovery
(Evfimievski et al., 2002; Rizvi & Haritsa, 2002; Zang, Wang, & Zhao,
2004).
·
Space
Transformation Techniques: These
techniques are specifically designed to address privacy-preserving clustering.
These techniques are designed to protect the underlying data values subjected
to clustering without jeopardizing the similarity between objects under
analysis. Thus, a space transformation technique must not only meet privacy
requirements but also guarantee valid clustering results. We categorize space
transformation techniques into two major groups: (1) object similarity-based
representation relies on the idea behind the similarity between objects, i.e.,
a data owner could share some data for clustering analysis by simply computing
the dissimilarity matrix (matrix of distances) between the objects and then
sharing such a matrix with a third party. Many clustering algorithms in the
literature operate on a dissimilarity matrix (Han & Kamber, 2001). This
solution is simple to be implemented and is secure, but requires a high
communication cost (Oliveira & Zaïane, 2004); (2) dimensionality
reduction-based transformation can be used to address privacy-preserving
clustering when the attributes of objects are available either in a central
repository or vertically partitioned across many sites. By reducing the
dimensionality of a dataset to a sufficiently small value, one can find a
trade-off between privacy, communication cost, and accuracy. Once the
dimensionality of a database is reduced, the released database preserves (or
slightly modifies) the distances between data points. In tandem with the
benefit of preserving the similarity between data points, this solution protects
individuals' privacy since the attribute values of the objects in the
transformed data are completely different from those in the original data
(Oliveira & Zaïane, 2004).
Data Restriction
Techniques
Data restriction techniques
focus on limiting the access to mining results through either generalization or
suppression of information (e.g., items in transactions, attributes in
relations), or even by blocking the access to some patterns that are not
supposed to be discovered. Such techniques can be divided into two groups:
Blocking-based techniques and Sanitization-based techniques.
·
Blocking-Based
Techniques: These techniques aim at
hiding some sensitive information when data are shared for mining. The private
information includes sensitive association rules and classification rules that
must remain private. Before releasing the data for mining, data owners must
consider how much information can be inferred or calculated from large
databases, and must look for ways to minimize the leakage of such information.
In general, blocking-based techniques are feasible to recover patterns less
frequent than originally since sensitive information is either suppressed or
replaced with unknowns to preserve privacy. The techniques in (Johnsten &
Raghavan, 2001) address privacy preservation in classification, while the
techniques in (Johnsten & Raghavan, 2002; Saygin, Verykios, & Clifton,
2001) address privacy-preserving association rule mining.
·
Sanitization-Based
Techniques: Unlike blocking-based
techniques that hide sensitive information by replacing some items or attribute
values with unknowns, sanitization-based techniques hide sensitive information
by strategically suppressing some items in transactional databases, or even by
generalizing information to preserve privacy in classification. These
techniques can be categorized into two major groups: (1) data-sharing
techniques in which the sanitization process acts on the data to remove or hide
the group of sensitive association rules that contain sensitive knowledge. To do
so, a small number of transactions that contain the sensitive rules have to be
modified by deleting one or more items from them or even adding some noise,
i.e., new items not originally present in such transactions (Verykios et al.,
2004; Dasseni et al., 2001; Oliveira & Zaïane, 2002, 2003a, 2003b); and (2)
pattern-sharing techniques in which the sanitizing algorithm acts on the rules
mined from a database, instead of the data itself. The existing solution
removes all sensitive rules before the sharing process and blocks some
inference channels (Oliveira, Zaïane, & Saygin, 2004). In the context of
predictive modeling, a framework was proposed in (Iyengar, 2002) for preserving
the anonymity of individuals or entities when data are shared or made publicly.
Data Ownership
Techniques
Data ownership techniques can be
applied to two different scenarios: (1) to protect the ownership of data by
people about whom the data were collected (Felty & Matwin, 2002). The idea
behind this approach is that a data owner may prevent the data from being used
for some purposes and allow them to be used for other purposes. To accomplish
that, this solution is based on encoding permissions on the use of data as
theorems about programs that process and mine the data. Theorem proving
techniques are then used to guarantee that these programs comply with the
permissions; and (2) to identify the entity that receives confidential data
when such data are shared or exchanged (Mucsi-Nagy & Matwin, 2004). When
sharing or exchanging confidential data, this approach ensures that no one can
read confidential data except the receiver(s). It can be used in different
scenarios, such as statistical or research purposes, data mining, and on-line
business-to-business (B2B) interactions.
Are These Techniques
Applicable to Web Data?
After describing the existing
PPDM techniques, we now move on to analyze which of these techniques are
applicable to Web data. To do so, hereinafter we use the following notation:
·
WDT: these techniques are designed essentially to support
Web usage mining, i.e., the techniques address Web data applications only. We
refer to these techniques as Web Data Techniques (WDT).
·
GPT: these techniques can be used to support both public
data release and Web-based applications. We refer to these techniques as
General Purpose Techniques (GPT).
a)
Cryptography-Based Techniques: these
techniques can be used to support business collaboration on the Web. Scenario 2
(in Section: The Basis of Privacy-Preserving Data Mining) is a typical example of
Web-based application which can be addressed by cryptography-based techniques.
Other applications related to e-commerce can be found in (Srivastava et al.,
2000; Kou & Yesha, 2000). Therefore, such techniques are classified as WDT.
b)
Generative-Based Techniques: these
techniques can be applied to scenarios in which the goal is to extract useful
knowledge from large, distributed data repositories. In these scenarios, the
data cannot be directly centralized or unified as a single file or database
either due to legal, proprietary or technical restrictions. In general,
generative-based techniques are designed to support distributed Web-based
applications.
c)
Noise Addition Techniques: these
techniques can be categorized as GPT. For instance, data swapping and data
distortion techniques are used for public data release, while data
randomization could be used to build models for on-line recommendations (Zang
et al., 2004). Scenario 1 (in Section: The Basis of Privacy-Preserving Data
Mining) is a typical example of an on-line recommendation system.
d)
Space Transformation Techniques:
these are general purpose techniques (GPT). These techniques could be used to
promote social benefits as well as to address applications on the Web (Oliveira
& Zaïane, 2004). An example of social benefit occurs, for instance, when a
hospital shares some data for research purposes (e.g., cluster of patients with
the same diseases). Space transformation techniques can also be used when the
data mining process is outsourced or even when the data are distributed across
many sites.
e)
Blocking-Based Techniques: in
general, these techniques are applied to protect sensitive information in
databases. They could be used to simulate an access control in a database in
which some information is hidden from users who do not have the right to access
it. However, these techniques can also be used to suppress confidential
information before the release of data for mining. We classify such techniques
as GPT.
f)
Sanitization-Based Techniques: Like
blocking-based techniques, sanitization-based techniques can be used by
statistical offices who publish sanitized version of data (e.g., census
problem). In
addition,
sanitization-based techniques can be used to build models for on-line
recommendations
as
described in Scenario 1 (in Section: The Basis of Privacy-Preserving Data
Mining).
g)
Data Ownership Techniques: These
techniques implement a mechanism enforcing data ownership by the individuals to
whom the data belongs. When sharing confidential data, these techniques can
also be used to ensure that no one can read confidential data except the
receiver(s) that are authorized to do so. The most evident applications of such
techniques are related to Web mining and on-line business-to-business (B2B)
interactions.
Table 1 shows a summary of
the PPDM techniques and their relationship with Web data applications.
PPDM Techniques Category
|
Category
|
Cryptography-Based
Techniques
|
WDT
|
Generative-Based
Techniques
|
WDT
|
Noise
Addition Techniques
|
GPT
|
Space
Transformation Technique
|
GPT
|
Blocking-Based
Techniques
|
GPT
|
Sanitization-Based
Techniques
|
GPT
|
Data
Ownership Techniques
|
WDT
|
Table 1: A summary of the PPDM techniques and their
relationship with Web data.
REQUIREMENTS FOR
TECHNICAL SOLUTIONS
Requirements for the development
of technical solutions
Ideally, a technical solution for a PPDM scenario
would enable us to enforce privacy safeguards and to control the sharing and
use of personal data. However, such a solution raises some crucial questions:
·
What levels of effectiveness
are in fact technologically possible and what corresponding regulatory measures
are needed to achieve these levels?
·
What degrees of
privacy and anonymity must be sacrificed to achieve valid data mining results?
These questions cannot have
“yes-no” answers, but involve a range of technological possibilities and social
choices. The worst response to such questions is to ignore them completely and
not pursue the means by which we can eventually provide informed answers. The
above questions can be to some extent addressed if we provide some key
requirements to guide the development of technical solutions.
The following key words are
used to specify the extent to which an item is a requirement for the
development of technical solutions to address PPDM:
·
Must: this word means that the item is an absolute
requirement;
·
Should: this word means that there may exist valid reasons
not to treat this item as a requirement, but the full implications should be
understood and the case carefully weighed before discarding this item.
a) Independence: A promising solution for the problem of PPDM, for
any specific data mining task (e.g., association rules, clustering, and
classification), should be independent of the mining task algorithm.
b) Accuracy: When it is possible, an effective solution should do
better than a trade-off between privacy and accuracy on the disclosure of data
mining results. Sometimes a trade-off must be found as in scenario 2 (in
Section: The Basis of Privacy-Preserving Data Mining).
c) Privacy Level: This is also a fundamental requirement in PPDM. A
technical solution must ensure that the mining process does not violate privacy
up to a certain degree of security.
d) Attribute
Heterogeneity: A technical solution
for PPDM should handle heterogeneous attributes (e.g., categorical and
numerical).
e) Communication Cost: When addressing data distributed across many sites,
a technical solution should consider carefully issues of communication cost.
Requirements to guide
the deployment of technical solutions
Information technology vendors in the near future
will offer a variety of products which claim to help protect privacy in data
mining. How can we evaluate and decide whether what is being offered is useful?
The nonexistence of proper instruments to evaluate the usefulness and
feasibility of a solution to address a PPDM scenario challenge us to identify
the following requirements:
a) Privacy
Identification: We should identify
what information is private. Is the technical solution aiming at protecting
individual privacy or collective privacy?
b) Privacy Standards: Does the technical solution comply with
international instruments that state and enforce rules (e.g., principles and/or
policies) for use of automated processing of private information?
c) Privacy Safeguards: Is it possible to record what has been done with
private information and be transparent with individuals about whom the private
information pertains?
d) Disclosure Limitation: Are there metrics to measure how much private
information is disclosed? Since privacy has many meanings depending on the
context, we may require a set of metrics to do so. What is most important is
that we need to measure not only how much private information is disclosed, but
also measure the impact of a technical solution on the data and on valid mining
results.
e) Update Match: When a new technical solution is launched, two
aspects should be considered: i) the solution should comply with existing
privacy principles and policies; ii) in case of modifications to privacy
principles and/or policies that guide the development of technical solutions,
any release should consider these new modifications.
FUTURE RESEARCH TRENDS
Preserving privacy on the Web has an important impact
on many Web activities and Web applications. In particular, privacy issues have
attracted a lot of attention due to the growth of e-commerce and e-business.
These issues are further complicated by the global and self-regulatory nature
of the Web.
Privacy issues on the Web
are based on the fact that most users want to maintain strict anonymity on Web
applications and activities. The ease access to information on the Web, coupled
with the ready availability of personal data, also made it easier and more
tempting for interested parties (e.g., businesses and governments) to willingly
or inadvertently intrude on individuals' privacy in unprecedented ways.
Clearly, privacy issues on
Web data is an umbrella that encompasses many Web applications such as
e-commerce, stream data mining, multimedia mining, among others. In this work,
we focus on issues toward foundation for further research in PPDM on the Web
because these issues will certainly play a significant role in the future of
this new area. In particular, a common framework for PPDM should be conceived,
notably in terms of definitions, principles, policies, and requirements. The
advantages of a framework of that nature are as follows: (a) a common framework
will avoid confusing developers, practitioners, and many others interested in
PPDM on the Web; (b) adoption of a common framework will inhibit inconsistent e
orts in different ways, and will enable vendors and developers to make solid advances in the future of research in PPDM on the Web.
orts in different ways, and will enable vendors and developers to make solid advances in the future of research in PPDM on the Web.
The success of a framework
of this nature can only be guaranteed if it is backed up by a legal framework,
such as the Platform for Privacy Preferences (P3P) Project (Joseph & Faith,
1999). This project is emerging as an industry standard providing a simple,
automated way for users to gain more control over the use of personal
information on Web sites they visit.
The European Union has
taken a lead in setting up a regulatory framework for Internet Privacy and has
issued a directive that sets guidelines for processing and transfer of personal
data (European Comission, 1998).
CONCLUSION
In this chapter, we have
laid down the foundations for further research in the area of
Privacy-Preserving Data Mining (PPDM) on the Web. Although our work described
in this chapter is preliminary and conceptual in nature, it is a vital prerequisite
for the development and deployment of new techniques. In particular, we
described the problems we face in defining what information is private in data
mining. We then described the basis of PPDM including the historical roots, a
discussion on how privacy can be violated in data mining, and the definition of
privacy preservation in data mining based on users' personal information and
information concerning their collective activity. We also introduced a taxonomy
of the existing PPDM techniques and a discussion on how these techniques are
applicable to Web data. Subsequently, we suggested some desirable privacy
requirements that are related to industrial initiatives. These requirements are
essential for the development and deployment of technical solutions. Finally,
we pointed to standardization issues as a technical challenge for future
research trends in PPDM on the Web.
REFERENCES
Agrawal, R., & Srikant,
R. (2000). Privacy-Preserving Data Mining. In Proc. of the 2000 ACM SIGMOD
International Conference on Management of Data (pp. 439-450). Dallas, Texas.
Clifton, C., Kantarcioğlu,
M., & Vaidya, J. (2002). Defining Privacy For Data Mining. In Proc. of the
National Science Foundation Workshop on Next Generation Data Mining (pp.
126-133). Baltimore, MD, USA.
Clifton, C., & Marks,
D. (1996). Security and Privacy Implications of Data Mining. In Proc. of the
Workshop on Data Mining and Knowledge Discovery (pp. 15-19). Montreal, Canada.
Cockcroft, S., &
Clutterbuck, P. (2001). Attitudes Towards Information Privacy. In Proc. of the
12th Australasian Conference on Information Systems. Coffs Harbour, NSW,
Australia.
Culnan, M. J. (1993). How
Did They Get My Name?: An Exploratory Investigation of Consumer Attitudes
Toward Secondary Information. MIS Quartely, 17 (3), 341-363.
Dasseni, E., Verykios, V.
S., Elmagarmid, A. K., & Bertino, E. (2001). Hiding Association Rules by
Using Confidence and Support. In Proc. of the 4th Information Hiding Workshop
(pp. 369-383). Pittsburg, PA.
Estivill-Castro, V., &
Brankovic, L. (1999). Data Swapping: Balancing Privacy Against Precision in
Mining for Logic Rules. In Proc. of Data Warehousing and Knowledge Discovery
DaWaK-99 (pp. 389-398). Florence, Italy.
European Comission. (1998).
The directive on the protection of individuals with regard of the processing of
personal data and on the free movement of such data, 1998. Available at
http://www2.echo.lu.
Evfimievski, A., Srikant,
R., Agrawal, R., & Gehrke, J. (2002). Privacy Preserving Mining of
Association Rules. In Proc. of the 8th ACM SIGKDD Intl. Conf. on Knowledge
Discovery and Data Mining (pp. 217-228). Edmonton, AB, Canada.
Felty, A. P., & Matwin,
S. (2002). Privacy-Oriented Data Mining by Proof Checking. In Proc. of the 6th
European Conference on Principles of Data Mining and Knowledge Discovery (PKDD)
(pp. 138-149). Helsinki, Finland.
Garfinkel, S. (2001).
Database Nation: The Death of the Privacy in the 21st Century.
O'Reilly & Associates, Sebastopol, CA, USA.
Goldreich, O., Micali, S.,
& Wigderson, A. (1987). How to Play Any Mental Game - A Completeness
Theorem for Protocols with Honest Majority. In Proc. of the 19th Annual ACM
Symposium on Theory of Computing (pp. 218-229). New York City, USA.
Han, J., & Kamber, M.
(2001). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San
Francisco, CA.
Iyengar, V. S. (2002).
Transforming Data to Satisfy Privacy Constraints. In Proc. of the 8th ACM
SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (pp. 279-288).
Edmonton, AB, Canada.
Johnsten, T., &
Raghavan, V. V. (2001). Security Procedures for Classification Mining
Algorithms. In Proc. of 15th Annual IFIP WG 11.3 Working Conference on Database
and Applications Security (pp. 293-309). Niagara on the Lake, Ontario, Canada.
Johnsten, T., &
Raghavan, V. V. (2002). A Methodology for Hiding Knowledge in Databases. In
Proc. of the IEEE ICDM Workshop on Privacy, Security, and Data Mining (pp.
9-17). Maebashi City, Japan.
Joseph, R., & Faith, C.
L. (1999). The Platform for Privacy Preferences. 42(2):48-55.
Kantarcioğlu, M., &
Clifton, C. (2002). Privacy-Preserving Distributed Mining of Association Rules
on Horizontally Partitioned Data. In Proc. of The ACM SIGMOD Workshop on
Research Issues on Data Mining and Knowledge Discovery. Madison, Wisconsin.
Klösgen, W. (1995). KDD:
Public and Private Concerns. IEEE EXPERT, 10 (2), 55-57.
Kou, W., & Yesha, Y.
(2000). Electronic Commerce Technology Trends: Challenges and Opportunities.
IBM Press Alliance Publisher: David Uptmor, IIR Publications, Inc.
Lindell, Y., & Pinkas,
B. (2000). Privacy Preserving Data Mining. In Crypto 2000, springer-verlag
(lncs 1880) (pp. 36-54). Santa Barbara, CA.
Meregu, S., & Ghosh, J.
(2003). Privacy-Preserving Distributed Clustering Using Generative Models. In
Proc. of the 3rd IEEE International Conference on Data Mining (ICDM'03) (pp.
211-218). Melbourne, Florida, USA.
Mucsi-Nagy, A., &
Matwin, S. (2004). Digital Fingerprinting for Sharing of Confidential Data. In
Proc. of the Workshop on Privacy and Security Issues in Data Mining (pp. 11-26).
Pisa, Italy.
O'Leary, D. E. (1991).
Knowledge Discovery as a Threat to Database Security. In G. Piatetsky-Shapiro
and W. J. Frawley (editors): Knowledge Discovery in Databases. AAAI/MIT Press,
pages 507-516, Menlo Park, CA.
O'Leary, D. E. (1995). Some
Privacy Issues in Knowledge Discovery: The OECD Personal Privacy Guidelines.
IEEE EXPERT, 10 (2), 48-52.
Oliveira, S. R. M., &
Zaïane, O. R. (2002). Privacy Preserving Frequent Itemset Mining. In Proc. of
the IEEE ICDM Workshop on Privacy, Security, and Data Mining (pp. 43-54).
Maebashi City, Japan.
Oliveira, S. R. M., &
Zaïane, O. R. (2003a). Algorithms for Balancing Privacy and Knowledge Discovery
in Association Rule Mining. In Proc. of the 7th International Database
Engineering and Applications Symposium (IDEAS'03) (pp. 54-63). Hong Kong,
China.
Oliveira, S. R. M., &
Zaïane, O. R. (2003b). Protecting Sensitive Knowledge By Data Sanitization. In
Proc. of the 3rd IEEE International Conference on Data Mining (ICDM'03) (pp.
613-616). Melbourne, Florida, USA.
Oliveira, S. R. M., &
Zaïane, O. R. (2004). Privacy-Preserving Clustering by Object Similarity-Based
Representation and Dimensionality Reduction Transformation. In Proc. of the
Workshop on Privacy and Security Aspects of Data Mining (PSADM'04) in conjunction
with the Fourth IEEE International Conference on Data Mining (ICDM'04) (pp.
21-30). Brighton, UK.
Oliveira, S. R. M., Zaïane,
O. R., & Saygin, Y. (2004). Secure Association Rule Sharing. In Proc. of
the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining
(PAKDD'04) (pp. 74-85). Sydney, Australia.
Piatetsky-Shapiro, G.
(1995). Knowledge Discovery in Personal Data vs. Privacy: A Mini-Symposium.
IEEE Expert, 10 (2), 46-47.
Rezgui, A., Bouguettaya,
A., & Eltoweissy, M. Y. (2003). Privacy on the Web: Facts, Challenges, and
Solutions. IEEE Security & Privacy, 1 (6), 40-49.
Rizvi, S. J., &
Haritsa, J. R. (2002). Maintaining Data Privacy in Association Rule Mining. In
Proc. of the 28th International Conference on Very Large Data Bases.
Hong Kong, China.
Rosenberg, A. (2000).
Privacy as a Matter of Taste and Right. In E. F. Paul, F. D. Miller, and J.
Paul, editors, The Right to Privacy, pages 68-90, Cambridge University Press.
Saygin, Y., Verykios, V.
S., & Clifton, C. (2001). Using Unknowns to Prevent Discovery of
Association Rules. SIGMOD Record, 30 (4), 45-54.
Schoeman, F. D. (1984).
Philosophical Dimensions of Privacy, Cambridge Univ. Press.
Srivastava, J., Cooley, R.,
Deshpande, M., & Tan, P.-N. (2000). Web Usage Mining: Discovery and
Applications of Usage Patterns from Web Data. SIGKDD Explorations, 1 (2),
12-23.
Vaidya, J., & Clifton,
C. (2002). Privacy Preserving Association Rule Mining in Vertically Partitioned
Data. In Proc. of the 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and
Data Mining (pp. 639-644). Edmonton, AB, Canada.
Vaidya, J., & Clifton,
C. (2003). Privacy-Preserving K-Means Clustering Over Vertically Partitioned
Data. In Proc. of the 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and
Data Mining (pp. 206-215). Washington, DC, USA.
Veloso, A. A., Meira Jr.,
W., Parthasarathy, S., & Carvalho, M. B. (2003). Efficient, Accurate and
Privacy-Preserving Data Mining for Frequent Itemsets in Distributed Databases.
In Proc. of the 18th Brazilian Symposium on Databases (pp. 281-292). Manaus,
Brazil.
Verykios, V. S.,
Elmagarmid, A. K., Bertino, E., Saygin, Y., & Dasseni, E. (2004).
Association Rule Hiding. IEEE Transactions on Knowledge and Data Engineering,
16 (4), 434-447.
Warren, S. D., &
Brandeis, L. D. (1890). The Right to Privacy. Harvard Law Review, 4 (5),
193-220.
Westin, A. F. (1967). The
Right to Privacy, Atheneum.
Zang, N., Wang, S., &
Zhao, W. (2004). A New Scheme on Privacy Preserving Association Rule Mining. In
Proc. of the 15th European Conference on Machine Learning and the 8th European
Conference on Principles and Practice of Knowledge Discovery in Databases.
Pisa, Italy.
Comments
Post a Comment