Online Survey Sample and Data Quality Protocols sotech.com | 800-576-2728 sotech.com
Transcription
Online Survey Sample and Data Quality Protocols sotech.com | 800-576-2728 sotech.com
A Marketing Research Consultancy Online Survey Sample and Data Quality Protocols sotech.com sotech.com | | 800-576-2728 800-576-2728 1 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS Sample and Data Quality Socratic Technologies, Inc., has developed sophisticated sample scanning and quality assessment programs to identify and correct problems that may lead to reduced data reliability and bias. Historical Perspective From the earliest days of research, there have been problems with sample quality (i.e., poor recruiting, inaccurate screening, bias in sample pools, etc.) and potential respondents have attempted to submit multiple surveys (paper and pencil), lied to get into compensated studies (mall intercepts and focus groups) and displayed lazy answering habits (all forms of data collection). In the age of Internet surveying, this is becoming a highly discussed topic because we now have the technology to measure sample problems and we can detect exactly how many people are involved in “bad survey behaviors.” While this puts a keen spotlight on the nature of problems, we also have the technology to correct many of these issues in real time. So, because we are now aware of potential issues, we are better prepared than at any time in the past to deal with threats to data quality. This paper will detail the steps and procedures that we use at Socratic Technologies to ensure the highest data quality by correcting problems in both sample sourcing and bad survey behavior. Sample Sources & Quality Procedures The first line of defense in overall data quality is the sample source. Catching problems begins with examining the way panels are recruited. According to a variety of industry sources, preidentified sample sources (versus Web intercepts using pop-up invitations or banner ads) now account for almost 80% of U.S. online research participants (and this proportion is growing). Examples include: • Opt-in lists • Customer databases • National research panels • Private communities A common benefit of all of these sources is that they include a ready-to-use database from which a random or predefined sample can be selected and invited. In addition, prerecruitment helps to solidify the evidence of an opt-in permission for contact or to more completely establish an existing business relationship—at least one of which is needed to meet the requirements of email contact under the federal CAN-SPAM Act of 2003. In truth, panels of all kinds contain some level of bias driven by the way recruitment strategy is managed. At Socratic we rely on panels that are recruited primarily through direct invitation. We exclude sample sources that are recruited using a “get-paid-for-taking-surveys” approach. This ensures that the people who we are inviting to our surveys are not participating for strictly mercenary purposes—which has been shown to distort answers (i.e., answering questions in such a way as to “please” the researcher in exchange for future monetary rewards). In addition, we work with panel partners who undertake thorough profile verification and database cleaning procedures on an ongoing basis. sotech.com | 800-576-2728 2 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS Our approved panel partners regularly scan databases for: • Unlikely duplicated Internet server addresses • Series of similar addresses (abc@hotmail, bcd@hotmail, cde@hotmail, etc.) Panels and sample sources are like wine: If you start with poor grapes, no matter what the skill of the winemaker, the wine is still poor. How panels are recruited determines • Replicated mailing addresses (for incentive checks) • Other data that might indicate multiple sign-ups by the same individual • Lack of responsiveness (most drop panelists if they fail to respond to five invitations in a row) • Non-credible qualifications (e.g., persons who consistently report ownership or experience with every screening option) • A history of questionable survey behavior (see “Cheating Probability Score” later in this document) • Impossible changes to profiling information (e.g., a 34-year-old woman becoming an 18-year-old man) Figure 1: Socratic Technologies panel development Web site the long-run quality of the respondents they produce. sotech.com | 800-576-2728 3 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS Socratic’s Network of Global Panel Providers The following list details the panel partners (subject to change) to whom we regularly turn for recruitment on a global basis. VENDOR NAME COUNTRIES VENDOR NAME 3D interactive.com Australia EuroClix B.V. Panelcliz Netherlands Accurate Market ResearchMexico Flying Post UK, France, Germany Focus Forward US AdperioUS GainJapan AdvaithAsia Garcia Research AssociatesUS 42 Market Research France AG3 Brazil, Argentina, Mexico, Chile AIP Corporation Asia GMI COUNTRIES Amry Research Inquision South Africa, Turkey ARCPoland AuroraUK Aussie Survey UK & Australia Authentic Response US Beep World Austria, Switzerland, Germany BestLifeLATAM Blueberries C&R Research Services, Inc. Israel US Campus Fund Raiser US Cint All Countries Clear Voice / Oceanside All Countries Community View India CorpscanIndia CotterwebUS Data Collect Czech Republic DelviniaCanada Insight CN OMI Russia, Ukraine Opinion Health US Opinion Outpost/SSI All Countries Opinions UAE, Saudi Arabia Panel Base UK Panel Service Africa South Africa PanelbizEU HRHGreece IID Interface in DesignAsia COUNTRIES OfferwiseUS All Countries AlterechoBelgium Russia, Ukraine VENDOR NAME China, Hong Kong Panthera Interactive All Countries Precision Sample US Public Opinious Canada Pure Profile UK, US, Australia InzichtNetherlands, Belgium, France Quick Rewards Russia, Ukraine, US, UK Rakuten Research Japan iPanelOnlineAsia ResultaAsia IthinkUS RPAAsia ItracksCanada Sample Bus Asia IvoxBelgium Schlesinger Assoc. US Lab 42 SeapanelsAsia All Countries Lightspeed Research Italy, Spain, Germany, (UK Kantar Group) Australia, New Zealand, Netherlands, France, Sweden, UK, Switzerland LivraLATAM Luth Research US M3 Research Nordics Maktoob Research Middle East Market Intelligence US, EU Market Tools Canada, Australia US, UK, France, Spec Span US Spider Metrix Australia, UK, Canada, New Zealand, South Africa, US STR Center All Countries Telkoma South Africa Testspin/WorldWide All Countries Think Now Research US TKL Interactive US TNS New Zealand New Zealand All Countries All Countries Market-xcel India, Singapore Toluna Masmi Ukraine Hungary, Russia, United sample EmpanelUS Mc Million US uthinkCanada Empathy Panel Ireland Mo Web EU Empowered Comm. Australia My Points US, Canada WebMD Market Research Services EC Global Panel LATAM, US EksenTurkey Embrain Co. Asia ePanel Marketing ResearchChina Nerve planet UserneedsNordics US World One Research US, France, Germany, Spain India, China, Japan Net, Intelligence & ResearchKorea YOCGermany Erewards/ ResearchNOW All Countries Netquest Portugal, Spain YOUMINTIndia Esearch US, Canada, UK ODC Service Italy, France, Germany, Spain, UK Zapera (You Gov) All Countries sotech.com | 800-576-2728 4 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS Anti-Cheating Protocols Unlike other data collection modes, the As a first step in identifying and rejecting bad survey behavior, we need to differentiate between Cheating and Lazy Behavior. The solutions Socratic uses for handling each type of problem differ by class of delinquency. server technology Cheaters attempt to enter a survey multiple times in order to: used in online surveys • Collect compensation gives the researcher • Sabotage results far more control over in-process problems Lazy folks don’t really think, and they do the least amount of work in order to: related to cheating and • Receive compensation lazy behavior. • Avoid the burden, boredom or fatigue of long, repetitious, difficult surveys Many forms of possible cheating and lazy respondent behaviors can be detected using server-based data and response pattern recognition technologies. In some cases, bad respondents are immediately detected and rejected before they even begin the survey. This is critical for quality, because “illegitimate” or “duplicated” respondents decrease the value of every completed interview. Sometimes, we allow people to enter the survey, but then use pattern recognition software to detect “answer sequences” that warrant “tagging and bagging.” Note: While we inform cheaters that they’re busted and won’t be getting any incentive, we don’t tell them how they were caught! One of our key tools in assessing the quality of a respondent is the Socratic Cheating Probability Score (CPS). A CPS looks at many possible problems and classifies the risk associated with accepting an interview as “valid and complete.” However, we also need to be careful not to use a “medium probability score” as an automatic disqualifier. Just because the results are not what we expect, doesn’t mean they are wrong! Marginal scores should be used to “flag” an interview, which should then be reviewed before rejecting. High scores are usually rejected mid-survey before the respondent is qualified as having “completed.” Here are some examples of how we use technology to detect and reject common respondent problems: Repeat Survey Attempts Some cheaters simply attempt to retake surveys over and over again. These are the easiest to detect and reject. To avoid self-selection bias, most large surveys today are done “by customized invitation” (CAN-SPAM 2003) and use a “handshake” protocol. Preregistering individuals with verified profiling data in order to establish a double or triple opt-in status. Cheaters Solutions: Handshake Protocols A handshake protocol entails generating a unique URL suffix-code, which is used for the link to the survey in the email invitation. It is tied to a specific individual’s email address and/or panel member identification. Once it is marked as “complete” in the database no other submissions are permitted on that person’s account. An example of this random suffix code is as follows: http://sotechsurvey.com/survey/?pid= wx54Dlo1 sotech.com | 800-576-2728 5 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS Supplementing the invitation handshake, a cookie check is utilized. At the start of all surveys, Socratic looks for a cookie bearing that survey_id and, if it is found, the user will not be allowed to take the survey again. The respondent ID is immediately blocked, so that even if the respondent removes the cookie later on, he or she still won’t be allowed back in. At the time a survey is finished (complete or termination), a cookie with the survey_id will be placed on the user’s machine. But cookie checks are no longer sufficient by themselves to prevent multiple submission attempts. More advanced identification is needed. For a more advanced identification verification, Socratic utilizes an IP & Browser Config Check. This is a server-level test that is invisible to the respondent. Whenever a person’s browser hits a Web site, it exchanges information with the Web server in order for the Web pages (or survey pages) to display correctly. For responses to all surveys, a check can be made for multiple elements: information about the user’s system to the survey server. These strings are then logged and subsequent survey attempts are compared to determine whether exact matches are occurring. These are examples of browser strings: • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; Advanced Searchbar; .NET CLR 1.1.4322; .NET CLR 1.0.3705; KelkooToolbar 1.0.0) • Mozilla/4.0 (compatible; MSIE 6.1; Windows 95/98/NT/ME/2000/XP; 10290201-SDM; SV1; .NET CLR 1.0.3) Language Setting Another browser-based information set that is transmitted consists of the language settings for the user’s system. These, too, are logged and compared to subsequent survey attempts: • en-us,x-ns1pG7BO_dHNh7,x-ns2U3 • en-us,en;q=0.8,en-gb;q=0.5,sv; • zh-cn;q=1.0,zh-hk;q=0.9,zh-tw; • en-us, ja;q=0.90, ja-jp;q=0.93 IP Address Internal Clock Setting The first level of validation comes from checking the IP address of the respondent’s computer. IP addresses are usually generated based on a tightly defined geography. So if someone is supposed to be in California, and their IP address indicates a China-based service, this would be flagged as a potential cheating attempt. Finally, the user’s computer system has an internal time-keeping function that continuously monitors the time of day and date out to a number of decimal points. Each user’s computer will vary slightly even within the same time zone or within the same company’s system. Browser String Each browser sends a great deal of When these four measurement are taken together, the probability of two exact settings on all readable elements is extremely low. sotech.com | 800-576-2728 6 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS Techno-Cheaters Images that can be “read” by image recognition bots (as of 2013) Some cheaters are caught because they are trying to use technology to submit multiple surveys. Form populators and keystroke replicators are examples of auto-fill technologies. Technology for cheating in online surveys has proliferated over the past 10 years and in some areas of the world has become a cottage industry. However, with the correct server technology, Socratic can detect the profiles of cheating applications and thwart them in real time, prior to completing a survey. Techno-Cheaters Solutions Total automation can be thwarted by creating non-machine readable code keys that are used at the beginning of a survey to make sure a human being is responding versus a computer “bot.” We refer to this as a Handshake Code Key Protocol. One of the most popular Handshake Code Key Protocols is CAPTCHA. To prevent bots and other automated form completers from entering our surveys, a distorted image of a word or number can be displayed on the start screen of all Socratic projects. In order to gain access to a survey, the user has to enter the word or number shown in the image into a text box; if the result does not match the image, the user will not be allowed to enter the survey. (Note: Some dispensation and use of alternative forms of code keys are available for visually impaired individuals.) As computers become more and more sophisticated in their ability to detect patterns, the CAPTCHA distortions have become more complex. Images that cannot be “read” by image recognition bots Images adapted from Mori & Malik, ca. 2003, Breaking a visual CAPTCHA, http://www.cs.berkeley .edu/~mori/gimpy/gimpy. html sotech.com | 800-576-2728 7 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS Lazy Respondent Behavior Lazy behavior is far more prevalent as a survey A far more common problem with survey takers across all modes of data collection are people who just don’t take the time and effort to answer questions carefully. This can result in rushed surveys or those with replicated pattern issues. problem than outright There are several reasons why respondents don’t pay attention: cheating—primarily • Problem 1: Just plain lazy because it’s easier to • Problem 2: Survey design is torturous defeat cheaters than ●● Too long people who aren’t paying ●● Boring/repetitious attention. With new, ●● Too difficult more sophisticated ●● Not enough compensation algorithms, however, ●● No affinity with sponsor it is now possible to limit the influence of lazy respondents in mid-survey. But whatever the reason for lazy behavior, the symptoms are similar, and the preventative technologies are the same. Speeders In the case of rushed survey respondents (“speeders”), speed of submission can be used to detect surveys completed too quickly. One statistical metric that Socratic uses is the Minimum Survey Time Threshold. By adapting a normative formula for estimating the length of a survey based on the number of various types of questions, one can calculate an estimated time to completion and determine if actual time to completion is significantly lower. This test is run at a predetermined point, before the survey has been completed. Based on the time since starting, the number of closed-ended questions, and the number of open-ended questions, a determination will be made as to whether the respondent has taken an adequate amount of time to answer the questions. If (Time < (((# of CEs * secs/CE) + (# of OEs * secs/OE)) * 0.5)) Then FLAG. Replicated Patterns Another common problem caused by lazy behavior is the appearance of patterned answers throughout a survey (e.g., choosing the first answer for every question, or selecting a single rating point for all attributes). These are fairly easy to detect and the respondent can be “intercepted” in mid-survey and asked to reconsider patterned sequences. Socratic uses Pattern Recognition Protocols within a survey to detect and correct these types of problems. Here are some of the logic-based solutions we apply for common patterning problems: • XMas Treeing: This technique will identify those who “zig-zag” their answers (e.g., 1, 2, 3, 4, 5, 4, 3, 2, 1, etc.) ●● How to: When all attributes are completed, take the absolute value of all attribute-to-attribute differences. If the mean value is close to 1 you should flag them. • Straight-Lining: This technique will identify those who straight-line answers to a survey (e.g., taking the first choice on an answer set or entering 4, 4, 4, 4, 4, 4 on a matrix, etc.) ●● How to: Subtract each attribute (SubQuestion) from the previous and keep a running total. When all attributes are completed take the absolute value. If the mean value is 0 you should flag them. sotech.com | 800-576-2728 8 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS The majority of problems related to data quality can be detected before a survey is completed. However, a variety of ongoing checks can add even more assurance that respondents are who they claim to be and are located in the correct location. Panel cleaning is necessary for long-run viability. Random Answers While these Pattern Recognition Protocols pick up many common problems, they cannot detect random answer submission (e.g., 1, 5, 3, 2, 5, 4, 3, 1, 1, etc.). For this we need another type of logic: Convergent/Divergent Validity tests. This type of test relies on the assumption that similar questions should be answered in a similar fashion and polar opposites should receive inverse reactions. For example, if someone strongly agrees that a product concept is “expensive,” he or she should not also strongly agree that the same item is “inexpensive.” When these types of tests are in place, the survey designer has some flexibility to intercept a survey with “validity issues” and request that the respondent reconsider his or her answers. Cross-Survey Answer Block Sequences Occasionally, other anti-cheating/anti-lazy behavior protocols will fail to detect a well-executed illegitimate survey. For this purpose, Socratic also scans for repeated sequences using a Record Comparison Algorithm. Questionnaires are continuously scanned, record-to-record, for major blocks of duplicated field contents (e.g., >65% identical answer sequences). Note: Some level of discretion will be needed on surveys for which great similarities of opinion or homogeneity in the target population are anticipated. Future development is also planned to scan open-ended comments for dupli- cated phrases and blocks of similar text within live surveys. Currently, this can only be done post hoc. Post-Survey Panel Cleaning Post-Survey Detection For the panels managed by Socratic Technologies, the quality assurance program extends beyond the sample cleaning and mid-survey error testing. We also continuously monitor issues that can only be detected post hoc. Address Verification Every third or fourth incentive payment should be made by check, or a notice mailed to a physical address. If people want their reward, they have to drop any aliases or geographic pretext in order for delivery to be completed, and often times you can catch cheaters prior to distribution of an incentive. Of course, duplicated addresses, P.O. boxes, etc., are a give-away. We also look for slight name derivatives not usually caught by banks, including: • nick names (Richard Smith and Dick Smith) • use of initials (Richard Smith and R. Smith) • unusual capitalization (Richard Smith and RiCHard SmiTH) • small misspellings (Richard Smith and Richerd Smith) sotech.com | 800-576-2728 9 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS Conclusion Many features and security checks are now available for assuring the validity of modern online research. This includes pre-survey panel quality, mid-survey cheating and lazy behavior detection and post-survey panel cleaning. With these technologies in place, online research can now be more highly regulated than any other form of data collection. Not all survey bad behavior is malicious; some is driven by poor survey design. Some discretion will always be a requirement of survey usability: • Writing screeners that don’t telegraph qualification requirements • Keeping survey length and burden to a reasonable level • Minimizing the difficulty of compliance • Enhancing the engagement levels of boring tasks • Maximizing the communication that participation is worthwhile and appreciated While Socratic’s techniques can flag possible cheating or lazy behavior, we believe that the analyst should not just automatically reject interviews, but examine marginal cases for possible validity. sotech.com | 800-576-2728 10 CONTACT San Francisco Headquarters Socratic Technologies, Inc. 2505 Mariposa Street San Francisco, CA 94110-1424 T 415-430-2200 (800-5-SOCRATIC) Chicago Regional Office Socratic Technologies, Inc. 211 West Wacker Drive, Suite 1500 Chicago, IL 60606-1217 T 312-727-0200 (800-5-SOCRATIC) Contact Us sotech.com/contact Socratic Technologies, Incorporated, is a leader in the science of computer-based and interactive research methods. Founded in 1994 and headquartered in San Francisco, it is a research-based consultancy that builds proprietary, interactive tools that accelerate and improve research methods for the study of global markets. Socratic Technologies specializes in product development, brand articulation, and advertising research for the business-to-business and consumer products sectors. Registered Trademarks, Salesmarks and Copyrights The following product and service descriptors are protected and all rights are reserved. Brand Power Rating , BPR , Brand Power Index , CA , Configurator Analysis , Customer Risk Quadrant Analysis , NCURA , ReportSafe , reSearch Engine , SABR , Site-Within-Survey , Socratic CollageBuilder , Socratic ClutterBook , Socratic Browser , Socratic BlurMeter , Socratic CardSort , Socratic ColorModeler , Socratic CommuniScore , ® Socratic Forum , Socratic CopyMarkup , Socratic Te-Scope , Socratic Perceptometer , Socratic Usability Lab , The Bruzzone Model , Socratic ProductExhibitor , Socratic Concept Highlighter , Socratic Site Diagnostic , Socratic VirtualMagazine , Socratic VisualDifferentiator , Socratic Web Boards , Socratic Web Survey 2.0, Socratic WebComm Toolset , SSD , Socratic WebPanel Toolset , SWS 2.0, Socratic Commitment Analysis , Socratic WebConnect , Socratic Advocacy Driver Analysis . TM TM TM TM TM TM TM TM SM TM SM SM TM SM SM SM SM SM SM SM SM SM SM TM SM SM SM TM TM SM SM SM SM SM SM TM TM Socratic Technologies, Inc. © 1994–2014. Reproduction in whole or part without written permission is prohibited. Federal law provides severe civil and criminal penalties for unauthorized duplication or use of this material in physical or digital forms, including for internal use. ISSN 1084-2624. sotech.com | 800-576-2728 11