Predicting my favourite drink

Hello,

I'm trying to solve a problem of recommending/predicting 'my favorite drink' and I'm hoping to get some support from this community.

Problem definition: There are 20 different drinks, e.g. pepsi, coke, fanta, etc. There are millions of customers of a supermarket who were buying those drinks over a period of lets say last three months.

Data definition:

Drinks (id,name): 100:Pepsi,101:Coke,....

Transactions customer_id, list of bought drink ids 1 100,100,100,101,101,101 2 100, 102,106,106,106... ....

Definition of 'my favorite drink' is a bit foggy. We don't have any training data we can learn from, e.g. list of fans for a given drink, the only thing we have are transactions, and customer data (id, age, postcode). Customer may not have a favorite drink and this should be predicted as well.

Those are 4 approaches I came up for predicting 'my favorite drink'.

1) 50% ratio - The drink, I buy the most. If the percentage of a my drink transactions is >50% then this is my favorite drink. Otherwise I don't have a favorite drink.

2) Gini index, more clever version of 50% ratio, If I bought pepsi 4 times, and other 6 drinks once only each, then Pepsi is my favorite drink. Gini index = 1 minus sum of squares of drink probabilities. In this case Gini = 1 - (4/10)² + 6*(1/10)^2. I have a favorite drink if gini is <0.7.

3) Rationale - My favorite drink not necessarily has to be the one I drink the most. For example if I bought 49 CopaCopa drinks and 51 Pepsi drinks, then CopaCopa drink is more likely my favorite one. This is based on observations that customers who buy CopaCopa are more likely to buy Pepsi (because this is generally popular drink), than the other way round. If I buy the same number of unpopular CopaCopa and popular Pepsi drinks then it probably means I'm more likely a fan of CopaCopa.

Method 3a: Naive Bayes Text classifier. For this approach I calculate priors - probability of buying a given drink using Maximum Likelihood based on all customers transactions data, e.g. P(Pepsi)=0.2, P(CopaCopa)=0.02. And then I calculate conditional probabilities of buying a drink given I also bought something else, e.g. P(CopaCopa | Pepsi) = 0.03 and P(Pepsi|CopaCopa) = 0.07.

Customer has a favorite drink if a posterior, e.g P(CopaCopa | Pepsi, Pepsi, CopaCopa, CopaCopa) (probability of being a fan of CopaCopa given I bought both Pepsi and CopaCopa twice) is >50%.

Data for bayes classification (one record for a single drink transaction). Those five records represent a customer who bought three drinks 101,101,102 and a customer who bought two drinks 105: drink_id(prior) all_drink_ids_bought_by_customer_of_this_drink_transaction(prediction record) 101 101,101,102 101 101,101,102 102 101,101,102 105 105,105 105 105,105

Method 3b: Logistic regression, I represent transactions as

Target = transaction drink id, prediction variables = percentages of drinks for a given customer, who placed this transaction, e.g. for a single customer, who bought pepsi, pepsi, and copacopa, we have three classification records (one per transaction):

target, %pepsi, %copacopa, %coke,..... pepsi, 2/3,1/3,0,0,0,0... pepsi, 2/3,1/3,0,0,0,0... copacopa, 2/3,1/3,0,0,0,0...

Customer has a favorite drink if a logistic regression predicts drink with >50% confidence, e.g. I take a customer who is represented by classification record: 0.1(CopaCopa), 0.7(Pepsi),0(Coke)..... I'm fan of Pepsi with a confidence level of 0.64.

I would appreciate any feedback on presented approaches. Maybe there is a better way to address this problem? I would be also glad to hear on some papers describing similar prediction problems in various domains.

Regards.

submitted by danielkorzekwa
[link] [6 comments]

Predicting my favourite drink - need for help.

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...