RETRIEVAL OF TOP-K FILES BY MULTI-KEYWORD USING
Transcription
RETRIEVAL OF TOP-K FILES BY MULTI-KEYWORD USING
International Journal of Research In Science & Engineering Volume: 1 Special Issue: 2 e-ISSN: 2394-8299 p-ISSN: 2394-8280 RETRIEVAL OF TOP-K FILES BY MULTI-KEYWORD USING TRSE SCHEME OVER ENCRYPTED CLOUD DATA Manoj Kumar R1 , Maria Navin J R2 1 PG Student, Dept. of C.S.E,S.V.C.E, Bengaluru. manojmanumr@gmail.com 2. Asst.Professor, Dept. Of CS&E, S.V.C.E, Bengaluru. marianavin.jr@gmail.com ABSTRACT Cloud computing is the term that describes the mean of delivering a information to the end user as service. A concern of sensitive information on cloud potentially causes privacy problems. Data encryption protects data security to some extent. The data owner has a collection of n files to outsource onto the cloud server in encrypted form. To achieve this, the data owner build a searchable index from a collection of keywords and then outsources both the encrypted index and encrypted files onto the cloud server. The authorized data user at first generates a query request and the cloud server sends relevant files to the data user. To eliminate the information leakage, a two-round searchable encryption (TRSE) scheme has been proposed that supports top-k multi-keyword retrieval. Homomorpic encryption and Vector space model are employed that involve in ranking. Since ranking is done on user-side based on order-preserving encryption efficiency in retrieval of file is improved. The files are ranked in the order of relevance by users interest and only the files with the highest relevance are sent back to users. Keywords: Cloud, multi-keyword, vector space model, homomorphic encryption. ----------------------------------------------------------------------------------------------------------------------------1. INTRODUCTION Data outsourcing is an advanced data service for users to store sensitive data into a storage pace. The sensitive data is managed on remote servers maintained by trusted third party. The distributed nature of data management services give assurances to detect and correct faulty behaviour. This is relevant for outsourced data frameworks in which data owners place their sensitive data into specialized storage spaces. Data owners outsource their data without assurances of confidentiality, security. Confidentiality can be achieved by encrypting the data. But the challenge here is that how to enable search and retrieval over such encrypted data. Several searchable symmetric encryptions (SSE) [1] are available which enable both search and retrieval over encrypted data, but each have their own drawbacks. Traditional SSE schemes allow users to retrieve the cipher text in a secure way, but most of them work based on Boolean keyword search [2], [3]. Boolean keyword search gives results based on whether a keyword exists in a file or not, without considering the relevance with the queried keyword of these files in the result. Some SSE schemes based on “order preserving encryption” breaches the privacy of sensitive information and allows only single keyword in search query [4], [5]. There are some multi-keyword based searching schemes which enables secure indexing and ranked searching [6], [7], but have issues on how to strike a balance between security an d efficiency. So a new searchable encryption scheme called TRSE is been proposed, in which new technologies in cryptographic system and IR community are used. It enables us to get the retrieval result as the most relevant files that match users requirement. It indicates that files are ranked in the order of relevance, and only the files with high relevance are sent back to users. In TRSE, the concept of homomorphic encryption, vector space model and are introduced. Since the search operation is performed over encrypted data, information leakage can be eliminated and data can be searched and retrieved efficiently. 2. RELATED WORK Searchable symmetric encryption (SSE) allows a client to encrypt data in such a way that it can later perform search and retrieval from the storage server. A query is given, the server can search over the encrypted data and return the encrypted files that are appropriate. A SSE scheme is efficient if: (1) the cipher text reveals no IJRISE| www.ijrise.org|editor@ijrise.org[152-156] International Journal of Research In Science & Engineering Volume: 1 Special Issue: 2 e-ISSN: 2394-8299 p-ISSN: 2394-8280 information about the data; (2) the cipher text with a search query reveals at most the result of the search; (3) using the secret key, search query can be generated. Existing searchable encryption schemes [1], [6] allow a user to search securely over encrypted data by using keywords without decrypting it. These techniques support only Boolean keyword search. When directly applied in large community information outsourcing service, they go through following disadvantage . Most of the schemes follow either Boolean keyword search or single keyword search without ranking and thereby do not get relevant data. Ranked search greatly enhances system usability by returning the matching files in a ranked order. One simple ranked keyword search is implemented using the order-preserving symmetric encryption (OPSE). Order Preserving symmetric Encryption (OPE) [8] is an encryption scheme whose encryption function preserves numerical ordering of the plaintexts. OPE not just permits efficient range queries, but also allows indexing and query processing to be done exactly as efficiently as for unencrypted data. The main drawback of OPE scheme is that it inevitably leaks data privacy. Even though data are in encrypted form the server or attacker can still obtain information through statistical analysis. The leakage of information is termed as statistic leakage. The homomorphic encryption is proposed to improve the security without sacrificing the efficiency. Ranked search improves system usability by matching files in a ranked order. This paper proposes a novel encryption with ranking result of queried data which will give only relevant data. 3. PROPOSED SYSTEM 3.1 System Model Figure-1 Illustrates the architecture of cloud storage. It consists of the following entities. Fig-1: Architecture of Proposed Work Data owner: Data owner encrypt the keywords and the files and stores it in the cloud. Data owner stores the files in private clouds or public clouds Data Users: Data users are the users to whom the data owner has offered rights to get to the files stored in a cloud. The authentication of the users details are verified using cloud servers. Cloud Server: Cloud server where the encrypted files and the keywords are stored. It verifies the user authentication and allows the Cloud Service Provider to perform operations on the encrypted data. 3.2 TRSE Design Existing system uses server side ranking but the proposed system uses user side ranking of files. In our scheme, if user identities are satisfied then user can decrypt the data. Data owner is responsible to give an access policy to different kind of users to satisfy the user identity. Our mechanisms works as, Data owner encrypts the searchable index, data and outsource data onto the cloud. Trusted Centre is responsible for issuing tokens to user for certifying identity and access policy. When the cloud receives a query consisting of keywords, it calculates the IJRISE| www.ijrise.org|editor@ijrise.org[152-156] International Journal of Research In Science & Engineering Volume: 1 Special Issue: 2 e-ISSN: 2394-8299 p-ISSN: 2394-8280 scores from the encrypted index and then returns the encrypted score to user. Then, the data user decrypts the scores and picks out the highest ranked files [6]. The files are requested to cloud and required files are obtained, files are decrypted and used by the user. In this scheme, ranking is done on the data user side and calculations are performed at server side reducing the computational cost of the data user. To reduce the computational expense on the user side, all computational work should be done on the server side, so we need an encryption scheme which performs operations on the corresponding encrypted text. This method is called homomorphic en cryption. The vector space model consists of two attribute term frequency and document frequency [9]. Term frequency means the number of occurrences of the term in a file. Document frequency refers to the number of files that contains term. The vector space model is used to score a file on multiword. Files are ranked according to these scores in an order and most relevant files can be obtained. 3.2.1 Homomorphic Encryption scheme Homomorphic encryption allows specific types of computations to be carried out on the cipher text. To improve the computational cost on the user side, computing work is performed at the server side, so we need an encryption system to guarantee the performance and security at the same time on server side. Homomorphic encryption [7] allows calculation of cipher text without knowing anything about the plaintext to get the correct result. Homomorphic encryption has a great property it guarantee output. On the b asis of homomorphic property, the encryption scheme can be defined as three stages: KeyGen, Encrypt, Decrypt. KeyGen: The public key (PK) and secret key (SK) is generated randomly. Encrypt: A message is encrypted using secret key. Decrypt: A message is decrypted to get plain text. 3.2.2 Framework of TRSE The framework of TRSE consists of the following algorithms: Setup phase, Indexbuild, trapdoorreq, scoring, ranking. Setup: Data owner generates private key and public keys for homomorphic encryption . Indexbuil d: Searchable index and data are encrypted by the data owner securely and stores it into cloud. Trapdoorreq: Data user generates a multiple keyword and send to the cloud and it will be encrypted into a trapdoor. Scoring: When cloud receives trapdoor, it computes the scores of each file returns the encrypted scores to the data user. Ranking: The data user decrypts the score and then requests files with the highest scores from the cloud. The setup phase is activated only once for the initialization purposes by data owner for one particular application. For security purposes majority of work should be performed by data owner in the setup phase and other operation in the cloud. The framework of TRSE consists of two groups Initialization phase, Re trieval phase. Initialization Phase: The Initialization phase consists of Setup and Index build, in which the cloud server and data owner are involved. The details of the Initialization phase are as follows: 1. The data owner calls KeyGen to generate the public key and secret key for the homomorphic encryption scheme. 2. Data owner will extracts the collection of keywords from the file. The data owner builds a searchable index for each file. 3. The data owner encrypts the searchable index and outsources into cloud server. 4. The data owner encrypts the files with the cryptography schemes, and then outsource to the cloud server. 5. The access policy, identity information is give to trusted centre for access of data user. Retrieval Phase: The Retrieval phase consists of Tokengen, Trapdoorreq, Scoring and Ranking, in which the cloud server and data user are involved. The limited computing control on the user side, the computing work is given to server side. For the meantime, the confidentiality and privacy of sensitive information cannot be violated. According to the previous discussion, the ranking should be left to the user side while the cloud server still does most of the work without learning any sensitive information. The Retrieval phase details are as follows: IJRISE| www.ijrise.org|editor@ijrise.org[152-156] International Journal of Research In Science & Engineering Volume: 1 Special Issue: 2 e-ISSN: 2394-8299 p-ISSN: 2394-8280 1. The data user request the trusted centre, for the authorization by giving identity information for access to the cloud server. 2. Trusted Centre checks the identity information and generate token to access the cloud for particular user. 3. Then data user generates a set of keywords to search, and then the query is generated. After that, it is encrypted into trapdoor and then the user sends trapdoor to the cloud server. 4. For each file vector the cloud server computes the score and then returns the result vector to the data user. 5. The data user decrypts and gets the top-k highest-scoring files identifiers and sends it to the cloud server. 6. The cloud server returns the encrypted files to the data user. The trusted centre plays an important role in identifying user. During the file retrieval process data user will request cloud access to the trusted centre. When the user identity information’s are satisfied, the trusted centre will provide token to the data user for the access to th e cloud and data user access information is also shared with the cloud. When the identity information provided by the data user is not satisfied, data user is not allowed to access the cloud. The communication overhead will be very large if the encrypted t rapdoorreq size is too large. To solve this problem and improve efficiency, a substitution of the security of search scheme may be needed unless a new encryption scheme that provides more reasonable cipher text size becomes available. The ranking is performed using the top k select algorithm [9]. Note that k, which denotes the number of files that are most important to the user’s importance. Our proposed system reduces information leakage. We employed homomorphic to preserve data privacy. 4 SCREEN SHOTS 4.1 Home Page 4.2 Registration page 4.3 Owner Login 4.4 File upload IJRISE| www.ijrise.org|editor@ijrise.org[152-156] International Journal of Research In Science & Engineering Volume: 1 Special Issue: 2 e-ISSN: 2394-8299 p-ISSN: 2394-8280 4.5 Search keyword 5. Conclusion In this paper, we define to solve the problem of multi-keyword ranked search over encrypted cloud data and proposed a TRSE schema employing the homomorphic encryption which fulfils the security requirement of multi-keyword top-k retrieval over the encrypted cloud data. The homomorphic encryption algorithm gives privacy for data. We define similarity relevance Ranking scheme improves the retrieval of files. Finally, security and confidentiality of data is maintained in the cloud. References [1] R.Curtmola, J.A. Garay, S. Kamara, and R.V Ostrovsky, “Searchable Symmetric Encryption: Improved Definitions and Efficient Constructions”, Proc. ACM 13th Conf. Computer and Comm. SECURITY (CSS), 2006 [2] D. Song, D. Wagner, and A. Perrig, “Practical Techniques for Searches on Encrypted Data”, Proc. IEEE Symp. Security and Privacy, 2000 [3] D. Boneh, G. Crescenzo, R. Ostrovsky, and G. Persiano, “Public Key Encryption with Keyword Search”, Proc. Intl Conf. Theory and Applications of Cryptographic Techniques (Eurocrypt), 2004. [4] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure Ranked Keyword Search over Encrypted Cloud Data”, Proc. IEEE 30th Intl Conf. Distributed Computing System (ICDCS), 2010. [5] A. Swaminathan, Y. Mao, G.-M. Su, H. Gou,A.L. Varna, S. He, M. Wu, “Confidentiality- Preserving RankOrdered Search”, Proc. Workshop Storage Security and Survivability, 2007. [6] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-Preserving Multikeyword Ranked Search over Encrypted Cloud Data,” Proc. IEEE INFOCOM, 2011. [7] H. Hu, J. Xu, C. Ren, and B. Choi, “Processing Private Queries over Untrusted Data Cloud through Privacy Homomorphism”, Proc. IEEE 27th Intl Conf. Data Eng. (ICDE), 2011. [8] Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant Yirong Xu , “Order Preserving Encryption for Numeric Data”, IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120 [9] Jiadi Yu, Peng Lu, Yanmin Zhu, Guangtao Xue and Minglu Li,“Toward Secure Multi keyword Top-k Retrieval Over Encrypted Cloud Data” in IEEE Transactions on Dependable and Secure Computing Vol.10., No.4, July/August – 2013. IJRISE| www.ijrise.org|editor@ijrise.org[152-156]