In my post last week, I talked about crowdsourcing ratings and reviews to create and sustain credibility of the platform. Almost every platform that operates in a multi-sided market has a mechanism for the users of one side to rate the other side. In this post, I will talk about how to design an appropriate system of measuring the quality of the entity/ product/ service.
The dictionary definition of rating reads “classification or ranking of someone or something based on a comparative assessment of their quality, standard, or performance.”
At the end of every ride, OLA Cabs requests riders to rate the driver/ cab, and the driver ratings are available to the riders when they book the ride. Similarly, Uber has a two-way rating system, where riders rate the drivers and the drivers rate the riders. And the average ratings matter for the driver and the riders to continue using the platform.
The primary (definitional) issue with rating is that it is a comparative score. As a rider takes more and more rides in the OLA system, she is able to compare that particular ride with reference to the other rides in the same system. However, when a Uber loyalist (say for example, my colleagues from USA) takes an OLA ride while in India, he is rating his ride with reference to his Uber in San Francisco benchmark. And when someone who rarely takes an OLA (and otherwise relies on public transport like suburban trains/ buses) would rate his ride with reference to his bus ride. As the references change, the meaning of the same rating changes. Which brings us to the next concern with ratings. That it is always an overall score. The riders may penalize the driver with a lower rating for whatever reason: not able to find your destination, taking a longer route, not having the cab clean enough, or even for this things outside his control like a temporarily blocked road. And the same could be true of a superlative rating – depending on the rider’s benchmark, he could rate the driver a five-star rating in comparison to the crowded Chennai-Chengalpet suburban train, that he takes daily.
This is not to say that ratings are not useful. Over long periods, with sufficient data points, ratings do bring out the true quality and standard of performance. Underlined here is the “long periods of time” and “large number of data points”. Long periods of time provide sufficient opportunities for services with low ratings to improve their performance and sufficient data points provide for cushion against freak low (or high) ratings provided by irrational customers.
One insurance against inclinations to rate a service at either of the extremes (no central tendencies work here) is to decompose the ratings into various service touch points. For instance, the Jet Airways’ service tracker seeks feedback on every aspect of the flying, making the entire responding to the online questionnaire a drudgery. Such long questionnaires would therefore only attract people who have a reason to provide you feedback – who really had a bad experience and want to express their distress, or those who had a superlatively (and unexpected) great experience that they take the effort to fill-in the forms. When the service is as expected (good or bad), one wouldn’t expect customers to fill in long forms (unless mandated). Isn’t this why most of us teachers’ feedback scores have high standard deviations?
As a service aggregation platform, one would want to supplement rating scores with a descriptive assessment (justification) of the rating. For instance, the OLA cabs app would request you to provide the reasons for a low rating by choosing one of predefined set of options. One could not choose multiple options – for instance, it is possible that the driver was late, as well as had his car dirty. This is where open ended responses add value. Again, like long itemised rating forms, open ended questions attract respondents with extreme experiences.
Restaurant aggregators like zomato, ecommerce firms like Amazon.in, and travel sites like Booking.com have implemented reviews along with ratings. Zomato’s review forms require reviewers to provide details of their visit to the restaurant, and the food they ate. In the absence of such information, such reviews may not be relevant to the readers, who intend to use these as the basis for their decision making.
Reviews add value by highlighting specific peculiarities in the product/ service offerings that could not be captured by the ratings. For instance, a sensitive Uber driver who would play appropriate music that is appreciated by the rider would not be a standard data point that Uber wants to capture for all its drivers. However, such an information would be a great input to subsequent riders of that particular driver, who may choose to engage with him about the music. When this becomes a sufficient enough point of discussion in the reviews (enough people write about it about sufficient number of drivers, positively or negatively), Uber might take cogniscance of this to add this to the standard rating form. This is where detailed analytics of the reviews is required.
The dictionary definition of review is very insightful to our discussion: “a formal assessment of something with the intention of instituting change if necessary.” Good analysis of reviews should lead to change, if necessary.
Like the different benchmarks issue with ratings, reviews suffer from an assessment of credibility of the reviewer. It is important that the reviewer is an expert/ has demonstrated that he has used that particular product or service. Amazon.in certifies reviews with a tag “verfied purchase”; and provides the readers of the review an option of rating the review, if that was useful at all or not. Travel sites like booking.com ensure that reviewers have actually booked their stay on that particular hotel and provide the exact details of the reviewers’ credentials to provide the review. In the absence of such credibility, reviews could be abused and gamed in various ways.
Ratings and reviews are good apriori inputs to customers making product/ service selection choices. However, in the case of platforms like Practo, where one chooses physicians (doctors), I am not sure ratings and reviews are sufficient. When the client-service provider relationship is being evaluated (where the service provider is more knowledgeable than the service consumer; unlike a customer, where the customer is more knowledgeable than the service provider), ratings and reviews fall flat. Would you choose your dermatologist based on ratings by other patients, or by the recommendation of your trusted general physician?
The dictionary meaning of recommendation is revealing: “a suggestion or proposal as to the best course of action, especially one put forward by an authoritative body.” Notice the phrase – authoritative body. Credibility not just by consuming the product/ service, but other certifications would be required for a recommendation to be taken seriously. Most popular doctors might not be most efficient. And mind you, the ratings are reviews might just be about the quality of the infrastructure, waiting time to meet the doctor, friendliness of the staff and the doctor, as well as other clinical processes followed by the doctor and her staff. However, while seeking a recommendation for a serious illness, there could be clients who trade-off these against doctor’s effectiveness in curing the illness. Here is why platforms like Practo would require doctors to add their certifications and academic credentials, and mandate that they update them every six months, apart from the ratings and reviews by patients.
So, when you design you platform’s user experience and feedback system, choose carefully – is a rating sufficient, or would you also want a review and a recommendation?