How would you approach multivariate data analysis, if you want to create "ordinary" plots for business where one variable is plotted against another and some meaningful statement is in the plot?
I assume that there is no clear supervised setting, but that some general correlations or causal relations should be extracted. For example I have table-like data for travel orders with customer data, departure, arrival, destination, price, duration of stay, etc. In a plot I can visualize three dimensions: x, y and color. I can also drill down by one another variable, by creating multiple plots filtered on this variable. Candidates for quantities are the variables and possible aggregations on numerical variables (prize sum, prize per day mean, ...).
I would like to data-mine for interesting relations/plots as automatized as possible.
Some ideas are to find 2-variable correlations or 3-variables correlations with lift. Maybe you can automatize it so that valid thresholds are taken? Maybe some graph network analysis would help for causal relationships? Because a plot with some relations could be explained by another variable indirectly. Maybe Random Forests could determine relevant or correlated variables somehow? I could state that variables that have high correlation (e.g. prize and destination) should always be picked together for a plot.
Of course, an algorithm trained on this data would be most accurate, however, I cannot delivery program code, but it should be plots instead.
Do you have any ideas how to decide which of the many combinations for plots are candidates for interesting relations?
[link][1 comment]