Hi , my task is that I have a dataset {(Ai, Xi, yi)}, where Ai is adj matrix, Xi input features over some nodes, yi is the label of (Ai, Xi). That is to say Ai and Aj may be different both in nodes ,edges and dimensions(take web graph for example: Xi may represent a group of users in U.S., while Xj stands for a group of users in U.K.. yi is the label for anomaly detection of a single graph. All datas are from the same web.). I wonder if it works using GCN you propose. In your case, there's only one graph for the whole dataset. I read comments above and think this might work. However, as to my understanding, parameters in a GCN should fit the structure information of a unique graph. So how could it work if the graph varies with each datapoint? Thanks for explaining.
Graph Classification & Batchwise Training
Hi I have two questions:
 I would like to repurpose this for graph classification (where I have a single label per graph). I see there is an option to choose
featureless=False
when defining a newGraphConvolution
layer. However, the loss is still computed for each node, and I was wondering how I should change your code.  In the context of graph classification, how should I modify your
train.py
for batchwise training?
Thanks for putting this together!
该提问来源于开源项目：tkipf/gcn
 点赞
 写回答
 关注问题
 收藏
 复制链接分享
 邀请回答
62条回答

采纳
点赞 评论 复制链接分享

采纳
Hi ,
Yes you can have different graph structure (different adjacency matrix , variable number of nodes) for each example. But the dimensionality of your features nodes should be the same.
For practical consideration : 1) Make zero padding to get adjacency matrices into the same dimension that apply a mask on each graph. 2) For features dimension , it's preferable to get them into the same dimension (even if you can make a mask and zero padding as in 1) ) . But you can apply PCA and control the number of dimensions to keep.
Hope it helps
点赞 评论 复制链接分享 
采纳
Thanks very much for your response 👍, and it helps a lot : Sure the feature dimension should be the same. I still have two questions: 1.Can I implement batchwise training with adj matrix into tensor with dimension [B, m, m] where B is the batch size, m is the maximum number of nodes in that batch(some column could be padding)? 2.Why would this work? Since parameters in GCN should contain the information of a unique graph, do varied graphs leads to nonconvergence?
Thanks for your patience. hh
点赞 评论 复制链接分享 
采纳
1) Yes
2) Which parameter ? what do you mean by nonconvergence ?
点赞 评论 复制链接分享 
采纳
Thanks for explaining again 👍 The parameters I mean is the learnable parameters in the each layer of GCN. The nonconvergence I mean is to say that GCN is mean to solve the problem of unique global adj matrix, it might lead to nonconvergence if the adj matrix varies.
点赞 评论 复制链接分享 
采纳
How can we plot ROC curves for GCN that are capable of classifying graphs? Thanks in advance.
点赞 评论 复制链接分享 
采纳
Maybe this helps: https://scikitlearn.org/stable/modules/generated/sklearn.metrics.roc_curve.html
On Wed 8. May 2019 at 11:02 athirapvi wrote:
How can we plot ROC curves for GCN that are capable of classifying graphs? Thanks in advance.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/4#issuecomment490408650, or mute the thread https://github.com/notifications/unsubscribeauth/ABYBYYHL54MJCFER7PMGDSDPUKJKRANCNFSM4C5B7IGA .
点赞 评论 复制链接分享 
采纳
Can we use centrality as node attributes or will the GCN automatically learn it?
点赞 评论 复制链接分享 
采纳
Yes you can certainly use additional structuredependent node features. Even adding node degree as a node feature can help.
On Thu, May 9, 2019 at 8:51 AM teenusjohn wrote:
Can we use centrality as node attributes or will the GCN automatically learn it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/4#issuecomment490767167, or mute the thread https://github.com/notifications/unsubscribeauth/ABYBYYG4LLWRSXUOVST42ODPUPCYTANCNFSM4C5B7IGA .
点赞 评论 复制链接分享 
采纳
Thank you so much.
点赞 评论 复制链接分享 
采纳
Hi , I took about 200 different size graphs for graphlevel classification and got a good result. Now I'm wondering how to use the trained model to predict the graphs without labels. Can I use the model to predict a single graph?
点赞 评论 复制链接分享 
采纳
Hi , now I have a dataset consists of 10k small networks. The nodes of each network is 19 and all of them share a same adj matrix. That is to say, only the features of nodes are different among those networks. My task is to classify them into two categories. So I want to know whether the two methods ("global nodes" and "global pooling") are suitable in this case or not? Thanks for your reply!
点赞 评论 复制链接分享 
采纳
okey. So, to recap , adding attention layer on the top of GC (as you just mentionned) followed by a dense layer with softmax allows to keep unchanged the loss implemented to do graph classification ?
点赞 评论 复制链接分享 
采纳
I don't remember which loss is implemented in this repository but probably not. If i recall well in this repository we have a sigmoid activation function with the consequent sigmoid_cross_entropy. If you want to do singlelabel classification you better change it in order to use softmax at the and and softmax corssentropy as loss function
On Tue, Mar 6, 2018 at 12:35 PM, pinkfloyd06 wrote:
https://github.com/tommy9114 okey. So, to recap , adding attention layer on the top of GC (as you just mentionned) followed by a dense layer with softmax allows to keep unchanged the loss implemented to do graph classification ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/4#issuecomment370753378, or mute the thread https://github.com/notifications/unsubscribeauth/AHMXGk8cUcziYf1vkB41PILSG3BpNo27ks5tbnSAgaJpZM4LofoM .
 Tommaso Pasini Ph.D. Student, Linguistic Computing Laboratory (LCL) Computer Science Department Sapienza University of Roma Room: F13  via Regina Elena 295 palazzina F, 00161 Rome, Italy. Homepage: http://wwwusers.di.uniroma1.it/~pasini/ http://www.tommasopasini.altervista.org/
点赞 评论 复制链接分享 
采纳
Definitely, softmax cross entropy will do the job
点赞 评论 复制链接分享 
采纳
, l'm wondering why there is no pooling layer in this graph convolutional network ?
点赞 评论 复制链接分享 
采纳
I think that can answer better than me to this question!
点赞 评论 复制链接分享 
采纳
Pooling on graphs hasn't been convincingly shown yet to help in predictive performance. With pooling I mean hierarchical pooling, i.e. forming a hierarchy of coarser graphs. That's why we don't consider it here for simplicity. Using global pooling (like the implementation by ) is still a very good idea, though.
点赞 评论 复制链接分享 
采纳
Thank you for your answer. l have a follow up questions related to your comment.
1) How do you explain the fact of not using any pooling layer and your network performs ? 2) How do you explain that a hierarchy of coarser graph pooling doesn't help in predictive performance ? Is it related to the hierarchy itself or the way we build this hierarchy (adding fake node) ? 3) l would be very happy to give me a pointer to global pooling and its implementation on a graph . Thank you for the idea , l didn't heard about it before (global pooling on graph)
点赞 评论 复制链接分享 
采纳
This is getting offtopic. For global pooling the pointer by is a good start: https://arxiv.org/pdf/1703.03130.pdf . Hierarchical pooling typically doesn't work well because you have to define some heuristic structurebased fixed pooling operation ahead of time. What kind of coarsening strategy works well will be highly domain dependent. There's currently no silver bullet that works well for all kinds of graph data.
点赞 评论 复制链接分享 
采纳
thank you for your answer. It seems to be a good idea to apply a global pooling.
l would like to hear from about that and his global pooling implementation on graphs.
点赞 评论 复制链接分享 
采纳
Hi, I have two questions:  When I want to use batchwise training, is it right that I can't give to the model a batch of graphs(and y), unless I concatenate the all the X together and make a sparse block matrix A as input (just treat them as a big graph)?  When I use your method(a sparse block A), I think I should concatenate my X and y in format of (M+...+Z) FeatNum and (M+...+Z)classNum, and A is a sparse block matrix. Could you tell me if I am right and give some guidance? Thanks
点赞 评论 复制链接分享 
采纳
Yes, this is correct.
If you do not need sparse matrix operations (in case your graphs are quite small), you can resort to a much simpler implementation by creating an adjacency tensor of shape BxNxN where B is the batch size and N is the maximum number of nodes a graph has in this batch. You will need to add row and columnwise zeropadding if you have graphs of different size in a single batch. The form of the updates for the GCN model in this case should be easy to derive. Hope this helps!
点赞 评论 复制链接分享 
采纳
Could you please explain in more details about implementation of graph classification? Thank you!
点赞 评论 复制链接分享 
采纳
Do you have any specific questions? On Sun 15. Jul 2018 at 09:40 Hung Nguyen wrote:
Could you please explain in more details about implementation of graph classification? Thank you!
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/4#issuecomment405073633, or mute the thread https://github.com/notifications/unsubscribeauth/AHAcYGR4umETdq7FrOuJFZI9ul9Pvl4ks5uGvIEgaJpZM4LofoM .
点赞 评论 复制链接分享 
采纳
In order to go from nodelevel representation to a graphlevel representation you will indeed have to perform some kind of orderinvariant pooling operation. Any popular pooling strategy typically works fine, in my experience: attentive pooling (https://arxiv.org/abs/1703.03130) often works better than max pooling which in turn often works better than average pooling. Make sure to allocate a somewhat large representation size for the prepooling layer (as you're going to average a lot of values). You can also pool the representation of every layer individually and concatenate or add/average the result (similar to https://arxiv.org/abs/1509.09292).
If you don't have any initial node features, simply use an identity matrix (assuming the identity and position of every node is the same across all the graph samples) as an initial feature matrix. This essentially provides a unique onehot feature vector for every node in the graph. Only works if node identities stay the same across all graph samples. Otherwise you will have to come up with some sensible feature representation (e.g. node degree, betweenness centrality, etc.).
点赞 评论 复制链接分享 
采纳
Sorry to keep asking question but I'm not sure if I got the right intuition: The architecture is able to learn feature vectors for nodes only if I pass always the same graph right? If my training set has different graphs (assume they have all the same dimension) but each row in the adj matrix may represents a different node then this architecture is not a good fit to learn the fetures of the nodes right? Thanks for your patient and kind answers
点赞 评论 复制链接分享 
采纳
No worries! Your intuition is correct in the absence of any node features, i.e. if you pick an identity matrix for the initial feature matrix. If, however, you describe nodes by features (from a feature space which is shared by all nodes, i.e. two nodes could have the same features) the model can learn from different graphs (e.g. molecules), such as done here: https://arxiv.org/abs/1509.09292
点赞 评论 复制链接分享 
采纳
Ok very interesting, so the features could also be derived from the graph itself (degree, centrality, number of triangles that contains it etc.) right?
On Fri, Jan 12, 2018 at 11:44 AM, Thomas Kipf wrote:
No worries! Your intuition is correct in the absence of any node features, i.e. if you pick an identity matrix for the initial feature matrix. If, however, you describe nodes by features (from a feature space which is shared by all nodes, i.e. two nodes could have the same features) the model can learn from different graphs (e.g. molecules), such as done here: https://arxiv.org/abs/1509.09292
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/4#issuecomment357205568, or mute the thread https://github.com/notifications/unsubscribeauth/AHMXGoT0vVdcTWtnLdXTxdccKLPDM30Yks5tJzeqgaJpZM4LofoM .
 Tommaso Pasini Ph.D. Student, Linguistic Computing Laboratory (LCL) Computer Science Department Sapienza University of Roma Room: F13  via Regina Elena 295 palazzina F, 00161 Rome, Italy. Homepage: http://wwwusers.di.uniroma1.it/~pasini/ http://www.tommasopasini.altervista.org/
点赞 评论 复制链接分享 
采纳
Yes, sure! https://arxiv.org/abs/1509.09292 uses node degree as an initial feature, for example.
点赞 评论 复制链接分享 
采纳
Hello, have you succeeded to update the code to deal with graph classification (rather than node classification. Each graph has only one class)?
Thank you
点赞 评论 复制链接分享 
采纳
I do not intend to release this implementation as part of this repository. But it shouldn't be too difficult to implement this yourself :)
Edit: PRs are welcome
点赞 评论 复制链接分享 
采纳
I've implemented a selfattention layer:
class SelfAttention(Layer): def __init__(self, attention_dim, bias_dim, hidden_units, **kwargs): super().__init__(**kwargs) self.hidden_units = hidden_units self.A = None self.vars['Ws'] = tf.Variable(tf.random_uniform([attention_dim, self.hidden_units])) self.vars['W2'] = tf.Variable(tf.random_uniform([bias_dim, attention_dim])) def _call(self, inputs): aux = tf.tanh(tf.matmul(self.vars['Ws'], inputs, transpose_b=True)) self.A = tf.nn.softmax(tf.matmul(self.vars['W2'], aux)) tf.summary.histogram('self_attention', self.A) out = tf.matmul(self.A, inputs) out = tf.reshape(out, [out.get_shape().as_list()[0] * out.get_shape().as_list()[1]]) return out
you can stack it on top of the gc layers. (Not sure if it is 100% correct, if you find something wrong please tell me!) Best,
点赞 评论 复制链接分享 
采纳
Thank you for you answer. It looks ok (but l confirm when testing). So have you modified the loss function so that the loss is computer on the whole graph rather than at each node ?
点赞 评论 复制链接分享 
采纳
The attention layer take all the nodes embeddings into account and build a weighted average (where the weights are learnt) of those outputing a single hidden vector that should represent the whole graph. You can stack another dense layer with softmax to do the classification
点赞 评论 复制链接分享 
采纳
So as a last layer on the top of graph convolutional layers you add an attention layer (equation 7 in https://arxiv.org/pdf/1511.05493.pdf) then we add a dense layer with softmax to do classification. This process allows to keep the loss function (primarily defined to be applied at node level ) for graph classification. Correct me if l'm wrong !
点赞 评论 复制链接分享 
采纳
I don't know the paper you cited (thanks for the pointer it looks very interesting). I don't think we can keep the same loss function because it was developed to deal with multiple classifications (each node had his hown prediction). Now we have only one prediction so we can use a classical crossentropy function (i think)
点赞 评论 复制链接分享 
采纳
l thought we talk about the same attention layer mechanism. So for you, what did you mean by the attention layer (without the pointer l gave), is it implemented here ?
点赞 评论 复制链接分享 
采纳
I used selfattention implemented here https://arxiv.org/pdf/1703.03130.pdf as suggested in som post above i think
点赞 评论 复制链接分享 
采纳
Hi, it is not clear to me how to give to the model a batch of graphs instead of only one. Could you please give other details? Thanks
点赞 评论 复制链接分享 
采纳
Hi ,
If I remember correctly (last time I touched it was in April) it is not possible to forwardpass multiple graphs at once, because you'd need a rank>2 sparse Tensor. Attempts were made to support rank>2 sparse Tensors with tensorflow (https://github.com/tensorflow/tensorflow/pull/9373) but it did not work out.
So if you want to pass multiple graphs at once, you should find out if any of the other frameworks (PyTorch?) support rank>2 sparse Tensors.
点赞 评论 复制链接分享 
采纳
This is not quite correct, you can build a blockdiagonal sparse matrix representing multiple graph instances at once. You further need to introduce a sparse gathering matrix that pools representations from their according graphs. All this can be done with regular sparsedense matrix multiplications (of rank 2). Let me know if you need further details to implement this! On Mon 18. Dec 2017 at 17:43 michaelosthege wrote:
Hi https://github.com/tommy9114,
If I remember correctly (last time I touched it was in April) it is not possible to forwardpass multiple graphs at once, because you'd need a rank>2 sparse Tensor. Attempts were made to support rank>2 sparse Tensors with tensorflow (tensorflow/tensorflow#9373 https://github.com/tensorflow/tensorflow/pull/9373) but it did not work out.
So if you want to pass multiple graphs at once, you should find out if any of the other frameworks (PyTorch?) support rank>2 sparse Tensors.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/4#issuecomment352483155, or mute the thread https://github.com/notifications/unsubscribeauth/AHAcYO2TyQEbFrZWHiD5tBmer9iKcr7rks5tBpYWgaJpZM4LofoM .
点赞 评论 复制链接分享 
采纳
yea in fact I previously read about this blockdiagonal sparse matrix but actually I don't really know what is it and how to create it. I didn't find anything on google about how to build such data structure.
点赞 评论 复制链接分享 
采纳
The following figure should hopefully clarify the idea:
点赞 评论 复制链接分享 
采纳
Oh, great thanks!
On Wed, Dec 20, 2017 at 9:53 AM, Thomas Kipf wrote:
The following figure should hopefully clarify the idea: [image: graph_classification] https://userimages.githubusercontent.com/7347296/341986858c48e1d2e56b11e78ce664f1ba8a655c.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/4#issuecomment353002759, or mute the thread https://github.com/notifications/unsubscribeauth/AHMXGsWFLJMy4Fm2CH8jT2VJ1LZACLfks5tCMsDgaJpZM4LofoM .
 Tommaso Pasini Ph.D. Student, Linguistic Computing Laboratory (LCL) Computer Science Department Sapienza University of Roma Room: F13  via Regina Elena 295 palazzina F, 00161 Rome, Italy. Homepage: http://wwwusers.di.uniroma1.it/~pasini/ http://www.tommasopasini.altervista.org/
点赞 评论 复制链接分享 
采纳
Sorry to beat this into the ground. But, I'm interested in classifying about 2000 (i.e., N = 2000) 100x100 adjacency matrices (representing graphs) into two different classes  i.e., wholegraph labelling. Going by your figure, this means that I ought to feed it a sparse 200,000 x 200,000 adjacency matrix into the model (i.e., each of the 2000 graphs represented along the diagonal). And the output pooling matrix is 200,000 x 2000, with the class labels along the diagonals (i.e., where the 1's are in your figure). Is this correct or am I completely missing something?
点赞 评论 复制链接分享 
采纳
This is correct, but I would recommend training the model using smaller minibatches of, say, 32 or 64 graphs each. Make sure the pooling matrix is also sparse and that you're using sparsedense matrix multiplications wherever possible. This should run quite fast in practice. Good luck!
点赞 评论 复制链接分享 
采纳
Can I ask you if your graphs have the same node set (all the graphs have the same nodes) but different edges, or each graph has different nodes?
Best
On Thu, Jan 4, 2018 at 10:48 PM, Thomas Kipf wrote:
This is correct, but I would recommend training the model using smaller minibatches of, say, 32 or 64 graphs each. Make sure the pooling matrix is also sparse and that you're using sparsedense matrix multiplications wherever possible. This should run quite fast in practice. Good luck!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/4#issuecomment355408464, or mute the thread https://github.com/notifications/unsubscribeauth/AHMXGqMDKiG0M_MMLftv0dYDWx4SHwks5tHUc_gaJpZM4LofoM .
 Tommaso Pasini Ph.D. Student, Linguistic Computing Laboratory (LCL) Computer Science Department Sapienza University of Roma Room: F13  via Regina Elena 295 palazzina F, 00161 Rome, Italy. Homepage: http://wwwusers.di.uniroma1.it/~pasini/ http://www.tommasopasini.altervista.org/
点赞 评论 复制链接分享 
采纳
Same nodes and number of nodes, different edge values.
点赞 评论 复制链接分享 
采纳
i see, thanks
On Fri, Jan 5, 2018 at 11:19 AM, mleming wrote:
https://github.com/tommy9114 Same nodes and number of nodes, different edge values.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/4#issuecomment355520472, or mute the thread https://github.com/notifications/unsubscribeauth/AHMXGikQanQhd5tkg4tKbXs9FpPSJ9IGks5tHfcgaJpZM4LofoM .
 Tommaso Pasini Ph.D. Student, Linguistic Computing Laboratory (LCL) Computer Science Department Sapienza University of Roma Room: F13  via Regina Elena 295 palazzina F, 00161 Rome, Italy. Homepage: http://wwwusers.di.uniroma1.it/~pasini/ http://www.tommasopasini.altervista.org/
点赞 评论 复制链接分享 
采纳
Hi, does anyone successfully use this for graph classification? I implement a toy example on mnist using each pixel as node and its intensity as feature. However, the loss did not go down? I have modified the graphconv layer to dense matrix to work with parallel data loader. And the "ind" (size: N*batch, values are normalized to the number of nodes in each graph) is the pooling matrix as in the figure
class GCN(nn.Module):
def __init__(self, dropout): super(GCN, self).__init__() self.gc1 = GraphConv(1, 16) self.gc2 = GraphConv(16, 32) self.fc1 = nn.Linear(32, 500) self.fc2 = nn.Linear(500, 10) self.dropout = dropout def forward(self, x, adj, ind): # Conv layer x = F.relu(self.gc1(x, adj)) x = F.relu(self.gc2(x, adj)) # Batching before fc layer x = torch.mm(ind, x) # FC layers x = F.relu(self.fc1(x)) x = F.dropout(x, self.dropout, training=self.training) x = self.fc2(x) return x
点赞 评论 复制链接分享 
采纳
I used this model quite some time ago for graphlevel classification of the datasets provided in this paper: https://arxiv.org/abs/1605.05273 and got comparable (sometimes worse, sometimes better) results compared to their method. So maybe just have a look at these datasets. Shouldn't be too hard to load them in and run the model on these.
点赞 评论 复制链接分享 
采纳
I edited your Keras Graph convolution code (train.py) and got something working, though it doesn't work incredibly well. I am still a little confused. The model in train.py (again, in the Keras version) takes both the adjacency matrix and the features of the graph. However, I wish to train on the adjacency matrices themselves, so the feature matrix would be unnecessary. And, for turning all of the node labels into a single graph label (for classification purposes), should we simply take the average of the node labels and go with the maximum? With the code as it is, I am still unsure how this would work well for wholegraph classification.
On 8 January 2018 at 09:04, Thomas Kipf wrote:
I used this model quite some time ago for graphlevel classification of the datasets provided in this paper: https://arxiv.org/abs/1605.05273 and got comparable (sometimes worse, sometimes better) results compared to their method. So maybe just have a look at these datasets. Shouldn't be too hard to load them in and run the model on these.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/4#issuecomment355912795, or mute the thread https://github.com/notifications/unsubscribeauth/ACuN4w6F1ePthUKZCMC1sY9wQ2EMnIqAks5tIdoogaJpZM4LofoM .
点赞 评论 复制链接分享 
采纳
Thanks for your questions. For graphlevel classification you essentially have two options: * "hacky" version: you add a global node to the graph that is connected to all other nodes and run the standard protocol. You can then interpret the final value of this global node as graphlevel label. * global pooling: as a last layer, you can implement a global pooling layer that performs some form of pooling operation (optionally with attention mechanism as in https://arxiv.org/abs/1511.05493) over all graph nodes
For batchwise training over multiple graph instances (of potentially different size) with an adjacency matrix each, you can feed them in the form of a blockdiagonal adjacency matrix (each block corresponds to one graph instance) to the model, as illustrated in the figure below:
点赞 评论 复制链接分享 
采纳
Hi,
I have a followup question here. Though the code has a featureless option, I do not think it works directly when featureless is set to True. I just wanted to confirm that.
Thanks for your great code !
点赞 评论 复制链接分享 
采纳
Hi, thanks for the catch. For the
featureless
mode, you have to set the input dimension of the first layer (only the first layer supportsfeatureless
) to be the number of nodes in the graph. I'll make sure to add a sentence or two about this in the documentation at some point.点赞 评论 复制链接分享 
采纳
Hi,
I want to use this gcn framework to do a semisupervised regression mission (Given a graph(V, E) with feature matrix, the value(continuous) of some node is known while others are not, Target: predict the value of those unknown nodes). I directly change the loss function into RMSE, but it doesn't work well on validation dataset and test dataset. so,

is GCN suitable for regression?

how to change this framework to meet the requirements of regression?
Thanks you !
点赞 评论 复制链接分享 

采纳
As far as I know regression should work. But you may want to look at the Anormalization. If I remember correctly, the normalization method proposed by Kipf & Welling does not lead to the rows of the normalized adjacency summing to 1.
点赞 评论 复制链接分享 
采纳
It might be a good idea to turn the regression problem into a classification problem by bucketing the realvalued targets into several classes. This typically (empirically) works quite well in combination with a (softmax) crossentropy loss. See, e.g., the WaveNet paper (https://arxiv.org/abs/1609.03499) for details.
点赞 评论 复制链接分享 
采纳
It might also work to cast the regression problem as a classification problem by augmenting the input data similar to support vector regression. For example, regression from x to y, can be converted to a classification task with data ([x, y+r], 1) and ([x, yr], 1). Thus no need to change the normalization term.
点赞 评论 复制链接分享 
采纳
That sounds interesting. What would 'r' be in this case?
点赞 评论 复制链接分享 
采纳
'r' is a manually set realvalue margin, e.g., r = 0.8. The setting of 'r' typically depends on the range of y (to ensure most/all features are separable) and how much you'd like the samples to be separable. See here for details, where r is the epsilon in this case.
点赞 评论 复制链接分享
相关推荐
 回答 1 已采纳 学习c++ STL的容器时，有个map，但是它好像是映射的意思，它和数据结构中的图（graph）之间有关系吗，因为map除了可以翻译成映射，也可以翻译成图，或者它们之间没有关系
 回答 2 已采纳 Problem Description Everyone knows how to calculate the shortest path in a directed graph. In fact, the opposite problem is also easy. Given the length of shortest path between each pair of vertexes, can you find the original graph? Input The first line is the test case number T (T ≤ 100). First line of each case is an integer N (1 ≤ N ≤ 100), the number of vertexes. Following N lines each contains N integers. All these integers are less than 1000000. The jth integer of ith line is the shortest path from vertex i to j. The ith element of ith line is always 0. Other elements are all positive. Output For each case, you should output “Case k: ” first, where k indicates the case number and counts from one. Then one integer, the minimum possible edge number in original graph. Output “impossible” if such graph doesn't exist. Sample Input 3 3 0 1 1 1 0 1 1 1 0 3 0 1 3 4 0 2 7 3 0 3 0 1 4 1 0 2 4 2 0 Sample Output Case 1: 6 Case 2: 4 Case 3: impossible
 回答 1 已采纳 Let there be a simple graph with N vertices but we just know the degree of each vertex. Is it possible to reconstruct the graph only by these information? A simple graph is an undirected graph that has no loops (edges connected at both ends to the same vertex) and no more than one edge between any two different vertices. The degree of a vertex is the number of edges that connect to it. Input There are multiple cases. Each case contains two lines. The first line contains one integer N (2 ≤ N ≤ 100), the number of vertices in the graph. The second line conrains N integers in which the ith item is the degree of ith vertex and each degree is between 0 and N1(inclusive). Output If the graph can be uniquely determined by the vertex degree information, output "UNIQUE" in the first line. Then output the graph. If there are two or more different graphs can induce the same degree for all vertices, output "MULTIPLE" in the first line. Then output two different graphs in the following lines to proof. If the vertex degree sequence cannot deduced any graph, just output "IMPOSSIBLE". The output format of graph is as follows: N E u1 u2 ... uE v1 v2 ... vE Where N is the number of vertices and E is the number of edges, and {ui,vi} is the ith edge the the graph. The order of edges and the order of vertices in the edge representation is not important since we would use special judge to verify your answer. The number of each vertex is labeled from 1 to N. See sample output for more detail. Sample Input 1 0 6 5 5 5 4 4 3 6 5 4 4 4 4 3 6 3 4 3 1 2 0 Sample Output UNIQUE 1 0 UNIQUE 6 13 3 3 3 3 3 2 2 2 2 1 1 1 5 2 1 5 4 6 1 5 4 6 5 4 6 4 MULTIPLE 6 12 1 1 1 1 1 5 5 5 6 6 2 2 5 4 3 2 6 4 3 2 4 3 4 3 6 12 1 1 1 1 1 5 5 5 6 6 3 3 5 4 3 2 6 4 3 2 4 2 4 2 IMPOSSIBLE
 4年前回答 1 已采纳 Many of you know bipartite graph, which is an undirected graph whose vertices can be divided into two nonempty disjoint sets U and V such that every edge connnects a vertex in U to one in V, that is, there is no edge that connects a vertex v to another vertex which is in the same set of v. Similarly, a tripartite graph (which is also called Kengdie graph) is an undirected graph whose vertices can be divided into three nonempty disjoint sets U, V and W such that there is no edge that connects a vertex v to another vertex which is in the same set of v. A simple graph is a graph that has no loops and no more than one edge between any two different vertices. A graph is a simple tripartite graph if it is simple and it is a tripartite graph. You are given two integers V and E, and are asked to construct a simple tripartite graph which has exactly V vertices and E edges. In these E edges, M of them are given to you. Besides, any two of the three disjoint vertices sets must have different sizes. Input The first line is the number of cases T (T <= 100). For each case, the first line gives three integers V, E and M (1 <= V <= 50, 1,000 <= E <= 1,000,000, 0 <= M <= min(E, 10,000)). Then M lines follow, each gives an edge V1 V2 (1 <= V1 V2 <= V), means there should be an edge between V1 and V2. Output For each case, output E lines, that are the edges of the graph you construct. If there are multiple solutions, output any one. If there is no such graph, do not output anything. Sample Input 1 1 1000000 0 Sample Output
 4年前回答 1 已采纳 Description Consider the following game on an undirected graph G. There are two players, a red color player R and a blue color player B. Initially all edges of G are uncolored. The two players alternately color an uncolored edge of G with their color until all edges are colored. The goal of B is that in the end, the bluecolored edges form a connected spanning subgraph of G. A connected spanning subgraph of G is a connected subgraph that contains all the vertexes of graph G. The goal of R is to prevent B from achieving his goal. Assume that R starts the game. Suppose that both players play in the smartest way. Your task is to find out whether B will win the game. Input The input contains several test cases, ended by a line of `1 1'. Each test case begins with a line of two integers n (1 <= n <= 10) and m (0 <= m <= 30), indicating the number of vertexes and edges in the graph. All vertexes are numbered from 0 to n  1. Then m lines follow. Each line contains two integers p and q (0 <= p, q < n), indicating there is an edge between vertex p and vertex q. It is possible that the graph contains more than one edge between two vertexes and selfloops. Output For each test case print a line which is either 'YES' or 'NO' indicating B will win the game or not. Sample Input 3 4 0 1 1 2 2 0 0 2 4 6 1 0 1 2 2 0 0 2 3 0 1 0 1 1 Sample Output YES NO
 4年前回答 2 已采纳 Description An undirected graph is a set V of vertices and a set of E∈{V*V} edges.An undirected graph is connected if and only if for every pair (u,v) of vertices,u is reachable from v. You are to write a program that tries to calculate the number of different connected undirected graph with n vertices. For example,there are 4 different connected undirected graphs with 3 vertices. ![](http://poj.org/images/1737_1.jpg) Input The input contains several test cases. Each test case contains an integer n, denoting the number of vertices. You may assume that 1<=n<=50. The last test case is followed by one zero. Output For each test case output the answer on a single line. Sample Input 1 2 3 4 0 Sample Output 1 1 4 38
 4年前回答 1 已采纳 Description Let us consider an undirected graph G = . Let us denote by N(v) the set of vertices connected to vertex v (i.e. the set of neighbours of v). Recall that the number of vertices connected to v is called the degree of this vertex and is denoted by deg v. We will call graph G strange if it is connected and for its every vertex v the following conditions are satisfied: 1. deg v >= 2 (i.e. there are at least two vertices connected to v) 2. If deg v = 2 then the two neighbours of v are not connected by an edge 3. If degv > 2 then there is u ∈ N(v), such that the following is true: (a) deg u = 2 (b) Any two different vertices w1,w2 ∈ N(v) \ {u} are connected, i.e. (w1,w2) ∈ E. You are given some strange graph G. Find hamiltonian cycle in it, i.e. find such cycle that it goes through every vertex of G exactly once. Input The first line of the input file contains two integer numbers N and M  the number of vertices and edges in G respectively (3 <= N <= 10 000, M <= 100 000). 2M integer numbers follow  each pair represents vertices connected by the corresponding edge (vertices are numbered from 1 to N). It is guaranteed that each edge occurs exactly once in the input file and that there are no loops (i.e. ends of each edge are distinct). Output If there is no hamiltonian cycle in G, print 1 on the first line of the output file. In the other case output N numbers  the sequence of vertices of G as they appear in the hamiltonian cycle found (note that the last vertex must be connected to the first one). If there are several solutions, output any one. Sample Input 4 4 1 2 2 3 3 4 4 1 Sample Output 1 2 3 4
 回答 2 已采纳 I am creating a new Facebook app and there are actions attached to it, like the 'Listening to xxx' on Spotify. Trouble is that the call takes around 67 seconds which is quite a long time. Beneath my code is the results of curl_getinfo. Is it supposed to be this slow? $attachment = array( 'access_token' => $access_token, 'album' => 'sergeant peppers', ); $opts = array( CURLOPT_CONNECTTIMEOUT => 10, CURLOPT_RETURNTRANSFER => true, CURLOPT_TIMEOUT => 60, CURLOPT_USERAGENT => 'facebookphp3.1', CURLOPT_POST => true, CURLOPT_POSTFIELDS => $attachment, CURLOPT_URL => 'https://graph.facebook.com/me/APPNAME:listening' ); $ch = curl_init(); curl_setopt_array($ch, $opts); $result = curl_exec($ch); $info = curl_getinfo($ch); curl_close($ch); The results of curl_getinfo: [url] => https://graph.facebook.com/me/APPNAME:listening [content_type] => text/javascript; charset=UTF8 [http_code] => 400 [header_size] => 557 [request_size] => 238 [filetime] => 1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 6.002449 [namelookup_time] => 0.024892 [connect_time] => 0.179322 [pretransfer_time] => 0.77444 [size_upload] => 362 [size_download] => 212 [speed_download] => 35 [speed_upload] => 60 [download_content_length] => 212 [upload_content_length] => 362 [starttransfer_time] => 1.775707 [redirect_time] => 0 [certinfo] => Array ( ) [redirect_url] =>
 DrogoZhang的博客 Hierarchical Stochastic Graphlet Embedding for Graphbased Pattern Recognition (Pattern Recognition 2018) Anjan Dutta, Pau Riba, Josep Lladós, Alicia Fornés [Paper] [Matlab Reference]...
 DrogoZhang的博客 GSSNN: Graph Smoothing Splines Neural Network (AAAI 2020) Shichao Zhu, Lewei Zhou, Shirui Pan, Chuan Zhou, Guiying Yan, Bin Wang [Paper] [Python Reference] Graph Neural Tangent Ker...
 piscessssss的博客 在DGL中实现一个合成数据集data.MiniGCDataset。数据集有八种不同类型的图，每个类都有相同数量的图样本。 形成图形小批量 为了有效地训练神经网络，通常的做法是将多个样本批处理在一起形成一个小批量。...
 DrogoZhang的博客 A Persistent Weisfeiler–Lehman Procedure for Graph Classification (ICML 2019) Sebastian Rieck, Christian Bock, and Karsten Borgwardt [Paper] [Python Reference] Message Passing Grap...
 五月的echo的博客 目录核，图核，图卷积核Deep Graph Convolutional Neural Network (DGCNN)Graph convolution layersConnection with WeisfeilerLehman subtree kernelConnection with propagation kernel未完待续...... ...
 枫叶的博客 Unbiased Scene Graph Generation from Biased Training CVPR2020 本文有借鉴《the book of why》一本逻辑推理书中的思想。 如果一个模型训练时预测on这个介词有出现1000次，多于stand on，那么在测试的时候，很有...
 DrogoZhang的博客 Spectral and Statistical Fingerprints...A Simple Yet Effective Baseline for NonAttribute Graph Classification (ICLR RLPM 2019) Chen Cai, Yusu Wang [Paper] [Python Reference] NetLSD (KDD 2018) An...
 大笨熊。。。的博客 《Capsule Neural Networks for Graph Classification using Explicit Tensorial Graph Representations》论文解读 论文地址：https://arxiv.org/abs/1902.08399 1.论文思想： 提出一个模型通过对给出的数据集中每...
 you_jinpeng的博客 论文网址：https://arxiv.org/abs/1809.05679 ...文本图神经网络 （text GCN） 1.构建图 在GCN基础上进行的改造，主要改造在于提出将整个语料库构造成一个图。 结点是语料库中的每个（document + 字典中的word ）...
 10月前木盏的博客 Graph Pooling GNN/GCN 最先火的应用是在Node classification，然后先富带动后富，Graph classification也越来越多人研究。所以，Graph Pooling的研究其实是起步比较晚的。 Pooling就是池化操作，熟悉CNN的朋友都...
 4年前回答 2 已采纳 Description "Hike on a Graph" is a game that is played on a board on which an undirected graph is drawn. The graph is complete and has all loops, i.e. for any two locations there is exactly one arrow between them. The arrows are coloured. There are three players, and each of them has a piece. At the beginning of the game, the three pieces are in fixed locations on the graph. In turn, the players may do a move. A move consists of moving one's own piece along an arrow to a new location on the board. The following constraint is imposed on this: the piece may only be moved along arrows of the same colour as the arrow between the two opponents' pieces. ![](http://poj.org/images/2415_1.jpg) In the sixties ("make love not war") a oneperson variant of the game emerged. In this variant one person moves all the three pieces, not necessarily one after the other, but of course only one at a time. Goal of this game is to get all pieces onto the same location, using as few moves as possible. Find out the smallest number of moves that is necessary to get all three pieces onto the same location, for a given board layout and starting positions. Input The input contains several test cases. Each test case starts with the number n. Input is terminated by n=0. Otherwise, 1<=n<=50. Then follow three integers p1, p2, p3 with 1<=pi<=n denoting the starting locations of the game pieces. The colours of the arrows are given next as a m×m matrix of whitespaceseparated lowercase letters. The element mij denotes the colour of the arrow between the locations i and j. Since the graph is undirected, you can assume the matrix to be symmetrical. Output For each test case output on a single line the minimum number of moves required to get all three pieces onto the same location, or the word "impossible" if that is not possible for the given board and starting locations. Sample Input 3 1 2 3 r b r b b b r b r 2 1 2 2 y g g y 0 Sample Output 2 impossible
 4年前回答 2 已采纳 Problem Description As we know, Rikka is poor at math. Yuta is worrying about this situation, so he gives Rikka some math tasks to practice. There is one of them: There is an undirected graph with n vertices and m edges. Then Yuta does q operations on this graph. Each operation is described by two integers L,R (1≤L≤R≤n) and can be split into three steps: 1. Delete all the edges which have at least one vertice outside the range [L,R]. 2. Yuta wants you to tell him the number of connected component of the graph. 3. Restore the graph. This task is too hard for Rikka to solve. Can you help her? Input There are at most 100 testcases and there are at least 97 testcases with n,m,q≤1000. For each testcase, the first line contains three numbers n,m,q (n,q≤105,m≤2×105). Then m lines follow. Each line contains two numbers ui,vi (1≤ui,vi≤105) which describe an edge of the graph. Then q lines follows. Each line contains two numbers Li,Ri (1≤L≤R≤n) which describe an operation. Output For each operation you need print a single line with a single number  the answer of this operation. Sample Input 3 3 2 1 2 1 3 2 3 1 2 1 3 Sample Output 2 1