Approximate String Search for Retrieving Errorious Data in Spatial Database
Keywords:Approximate string search, Road network, Selectivity estimation, Spatial database
This work deals with approximate string search in large spatial database. Specifically focus on selectivity estimation for RSAS query in road networks. Selectivity estimation in road network is a union of string selectivity and spatial point selectivity. In this paper we propose a novel adaptive selection method, which is based on grouping technique. String selectivity is achieved by using q-grams and min-wise signature of strings. String similarity is measured by using edit distance metric technique, which is used to calculate threshold value between strings. Spatial point selected by using grouping technique called greedy algorithm. Space complexity of grouping method is based on neighborhood nodes. Effectiveness of this approach is high, when applying in large database.
FeifeiLi,Bin Yao, Mingwang Tang, and MariosHadjieleftheriou” Spatial approximate string search” IEEE transaction on knowledge and data engineering,Vol.25, No. 6, pp.1394-1409, JUNE 2013.
SwarupAcharya, ViswanathPoosala, and SridharRamaswamy, “Selectivity Estimation in Spatial Databases,” Proc. ACM SIGMOD’99 InternationalConference on Management of Data, pp. 13-24, Vol. 28, No.2, June 1999.
SurajitChaudhuri, VenkateeshGanti, and RaghavKaushik, “A Primitive Operator for Similarity Joins in Data Cleaning,” Proc. 22 International Conerance on Data Eng.(ICDE), pp. 5-16, 2006, April 2006.
ArturasMazeika, Mickel.H. Bo¨hlen, Nick Koudas, and DiveshSrivastava,“Estimating the Selectivity of Approximate String Queries,” ACM Trans. Database Systems, Vol. 32, No. 2, pp. 12-52, June 2007.
E. Tiakas, A.N. Papadopoulos, A. Nanopoulos, and Y. Manolopoulos,“Node and Edge Selectivity Estimation for Range Queries in Spatial Networks,” Information Systems, Vol. 34, pp. 328-352, May2009.
ShashiShekhar and DuenRen Liu, “CCAM: A Connectivity- Clustered Access Method for Networks and Network Computations,” IEEE Trans. Knowledge and Data Engineering, Vol. 9, No. 1, pp. 102-119, January 1997.
ArvindArasu, SurajitChaudhuri, Kris Ganjam, and RaghavKaushik, “Incorporating String Transformations in Record Matching,” Proc. ACM SIGMOD’08 International Conf. Management of Data, pp. 1231-1234, June2008.
SurajitChaudhuri, Kris Ganjam, VenkateshGanti, and RajeevMotwani, “Robust and Efficient Fuzzy Match for Online Data Cleaning,” Proc. ACM SIGMOD03 Int’l Conf. Management of Data, pp. 313-324, June 2003.
Xiaochun Yang, Bin Wang, and Chen Li, “Cost-Based Variable- Length-Gram Selection for String Collections to Support Approximate Queries Efficiently,” Proc. ACM SIGMOD’08 International Conf. Management of Data, pp. 353-364, June 2008.
DimitrisPapadias, Jun Zhang, Nikos Mamoulis, and Yufei Tao, “Query Processing in Spatial Network Databases,” Proc. 29Int’l Conf. Very Large Data Bases VLDB’03, pp. 802-813, September 2003.
DimitriosGunopulos, George Kollios,Vassilis J. Tsotras, and Carlotta Domeniconi,“Selectivity Estimators for Multidimensional Range Queries Over Real Attributes,” Springer,The VLDB J., Vol. 14, No. 2, pp 137-154, April 2005.  Liang Jin and Chen Li, “Selectivity Estimation for Fuzzy String Predicates in Large Data Sets,” Proc. 31Int’l Conf. Very Large Data Bases (VLDB03), pp. 397-408, August 2005.
Andrei Z. Broder, Moses Charikar, Alan M. Frieze, and MichealMitzenmacher, “Min-Wise Independent Permutations (Extended Abstract),” Proc.ACM 30th Symp. Theory of Computing,STOC’98, pp. 327-336, May1998.
Edith Cohen, “Size-Estimation Framework with Applications to Transitive Closure and Reachability,” J. Computer and System Sciences, Vol. 55, No. 3, pp. 441-453, December 1997.
How to Cite
Copyright (c) 2014 The Research Publication
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.