Compare sources to detect plagiarism

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Compare sources to detect plagiarism

Miro Hrončok
Hi,
I have several dozens implementations of the same module (some of them
work as expected, some of them do not). Those were provided by
students as part of they assignment. Now I'd like to check if someone
didn't just copy-pasted his friends code and changed the variable
names etc.

I was thinking if comparing CSG tree would do the trick. Would
comparing two CSG trees with diff work? Or do I need to load the tree
to some dictionary-like structure and compare those?

Also, is there a command line way to export CSG tree?

Thanks for your tips.

Miro Hrončok

Telefon: +420777974800

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Felipe Sanches
If I would do it, I would maybe write my own parser for the language and them compare the AST (abstract syntactic tree) of each sample code.

On the other hand, you could encourage your students to share code and improve upon the work of each other :-)


On Wed, Dec 2, 2015 at 11:05 AM, Miro Hrončok <[hidden email]> wrote:
Hi,
I have several dozens implementations of the same module (some of them
work as expected, some of them do not). Those were provided by
students as part of they assignment. Now I'd like to check if someone
didn't just copy-pasted his friends code and changed the variable
names etc.

I was thinking if comparing CSG tree would do the trick. Would
comparing two CSG trees with diff work? Or do I need to load the tree
to some dictionary-like structure and compare those?

Also, is there a command line way to export CSG tree?

Thanks for your tips.

Miro Hrončok

Telefon: <a href="tel:%2B420777974800" value="+420777974800">+420777974800

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org


_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Miro Hrončok
> If I would do it, I would maybe write my own parser for the language and
> them compare the AST (abstract syntactic tree) of each sample code.
That's something I don't (yet) want to do, if there are other options.

> On the other hand, you could encourage your students to share code and
> improve upon the work of each other :-)
That's what would be great, but unfortunately very hard to evaluate :(

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Felipe Sanches
You can make the students use GitHub and then evaluate based on how intensely they collaborate on the platform as opposed to simply evaluating the final piece of code.

On Wed, Dec 2, 2015 at 11:38 AM, Miro Hrončok <[hidden email]> wrote:
> If I would do it, I would maybe write my own parser for the language and
> them compare the AST (abstract syntactic tree) of each sample code.
That's something I don't (yet) want to do, if there are other options.

> On the other hand, you could encourage your students to share code and
> improve upon the work of each other :-)
That's what would be great, but unfortunately very hard to evaluate :(

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org


_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Felipe Sanches
In other words... you can see the ongoing process of creating the solution, instead of simply evaluating the correctness of the final script.

On Wed, Dec 2, 2015 at 11:41 AM, Felipe Sanches <[hidden email]> wrote:
You can make the students use GitHub and then evaluate based on how intensely they collaborate on the platform as opposed to simply evaluating the final piece of code.

On Wed, Dec 2, 2015 at 11:38 AM, Miro Hrončok <[hidden email]> wrote:
> If I would do it, I would maybe write my own parser for the language and
> them compare the AST (abstract syntactic tree) of each sample code.
That's something I don't (yet) want to do, if there are other options.

> On the other hand, you could encourage your students to share code and
> improve upon the work of each other :-)
That's what would be great, but unfortunately very hard to evaluate :(

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org



_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
tp3
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

tp3
In reply to this post by Miro Hrončok
Von: "Miro Hrončok" <[hidden email]>
> > If I would do it, I would maybe write my own parser for the language and
> > them compare the AST (abstract syntactic tree) of each sample code.
> That's something I don't (yet) want to do, if there are other options.
>
CSG output is basically a dump of the internal AST.

Exporting on command line is possible by specifying csg as file type.

openscad -o output.csg input.scad

ciao,
  Torsten.

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
-- Torsten
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Parkinbot
In reply to this post by Miro Hrončok
An approach like this is highly dependend on the problem.

- How many students do you have?
- How many different solutions will the problem have, if it is solved in a straight way? How probable is it to find the same solution (i.e. just naming differences allowed).
- How many different solutions are syntactically equivalent (e.g. ordering of union-elements)
- How many different solutions are semantically equivalent (is problem completely characterized?)

Up to 100 students: I'd recommend to just look at the codes for 10 minutes (6 secs each). With this you could exctract suspicious solutions for closer inspection, which would cost you another 20 minutes or so.
Writing a "usable" software will cost you month! And how would you validate it?


Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Tim V. Shaporev
In reply to this post by Miro Hrončok
When a teacher I used to say something like
"It is just unrealistic to write everything yourself, so it is really
good if you can use others' code, but to demonstrate that you understood
what you did please change this code at my eyes the following way..."

:-)

Just my $0.02
Tim

On 02.12.2015 16:38, Miro Hrončok wrote:

>> If I would do it, I would maybe write my own parser for the language and
>> them compare the AST (abstract syntactic tree) of each sample code.
> That's something I don't (yet) want to do, if there are other options.
>
>> On the other hand, you could encourage your students to share code and
>> improve upon the work of each other :-)
> That's what would be great, but unfortunately very hard to evaluate :(
>
> _______________________________________________
> OpenSCAD mailing list
> [hidden email]
> http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
>


_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Miro Hrončok
In reply to this post by tp3
2015-12-02 14:59 GMT+01:00 Torsten Paul <[hidden email]>:

> Von: "Miro Hrončok" <[hidden email]>
>> > If I would do it, I would maybe write my own parser for the language and
>> > them compare the AST (abstract syntactic tree) of each sample code.
>> That's something I don't (yet) want to do, if there are other options.
>>
> CSG output is basically a dump of the internal AST.
>
> Exporting on command line is possible by specifying csg as file type.
>
> openscad -o output.csg input.scad

Great, thanks.

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Miro Hrončok
In reply to this post by Parkinbot
2015-12-02 15:02 GMT+01:00 Parkinbot <[hidden email]>:
> An approach like this is highly dependend on the problem.
>
> - How many students do you have?
> - How many different solutions will the problem have, if it is solved in a
> straight way? How probable is it to find the same solution (i.e. just naming
> differences allowed).

Actually, more than a few.

> - How many different solutions are syntactically equivalent (e.g. ordering
> of union-elements)

That's what I'm not sure.

> - How many different solutions are semantically equivalent (is problem
> completely characterized?)

It is.

> Up to 100 students: I'd recommend to just look at the codes for 10 minutes
> (6 secs each). With this you could exctract suspicious solutions for closer
> inspection, which would cost you another 20 minutes or so.
I've tried that, but I simply cannot hold that much information in my
head. It's slightly less than 100.

> Writing a "usable" software will cost you month!
That's why I just try to use what's already available.

> And how would you validate
> it?
I would inspect the code manually, after I find similarities in the
CSG tree. Than I would get the students to explain to me why they did
this and that, and to change the module to act differently, etc.

Thanks for all the feedback.

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

clothbot
One other thing I would suggest is to require each student to sign+watermark their work with text() calls and/or simple surface() image maps.



- guarantees every model is at least a little unique
- presents opportunities to include side-bar discussions on intellectual property matters such as copyrights, trademarks, creative commons licensing, etc.
- provides a little incentive to “take ownership” of their models because it’s been personalized.

Andrew.

On Dec 2, 2015, at 9:29 AM, Miro Hrončok <[hidden email]> wrote:

2015-12-02 15:02 GMT+01:00 Parkinbot <[hidden email]>:
An approach like this is highly dependend on the problem.

- How many students do you have?
- How many different solutions will the problem have, if it is solved in a
straight way? How probable is it to find the same solution (i.e. just naming
differences allowed).

Actually, more than a few.

- How many different solutions are syntactically equivalent (e.g. ordering
of union-elements)

That's what I'm not sure.

- How many different solutions are semantically equivalent (is problem
completely characterized?)

It is.

Up to 100 students: I'd recommend to just look at the codes for 10 minutes
(6 secs each). With this you could exctract suspicious solutions for closer
inspection, which would cost you another 20 minutes or so.
I've tried that, but I simply cannot hold that much information in my
head. It's slightly less than 100.

Writing a "usable" software will cost you month!
That's why I just try to use what's already available.

And how would you validate
it?
I would inspect the code manually, after I find similarities in the
CSG tree. Than I would get the students to explain to me why they did
this and that, and to change the module to act differently, etc.

Thanks for all the feedback.

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org

--

"The future is already here.  It's just not very evenly distributed" -- William Gibson

Me: http://clothbot.com/wiki/




_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

doug.moen
In reply to this post by Miro Hrončok
At a previous job, we used cpd, an open source copy & paste detector, to find dodgy code where a developer had simply copied and pasted large blocks of code, rather than using abstraction. So, not an educational use case.

But, you could use 'cpd *.scad' to find code duplicated between different source files.

Here's a project similar to what I remember:
pmd.sourceforge.net/pmd-4.3.0/cpd.html

On Wednesday, 2 December 2015, Miro Hrončok <[hidden email]> wrote:
Hi,
I have several dozens implementations of the same module (some of them
work as expected, some of them do not). Those were provided by
students as part of they assignment. Now I'd like to check if someone
didn't just copy-pasted his friends code and changed the variable
names etc.

I was thinking if comparing CSG tree would do the trick. Would
comparing two CSG trees with diff work? Or do I need to load the tree
to some dictionary-like structure and compare those?

Also, is there a command line way to export CSG tree?

Thanks for your tips.

Miro Hrončok

Telefon: +420777974800

_______________________________________________
OpenSCAD mailing list
<a href="javascript:;" onclick="_e(event, &#39;cvml&#39;, &#39;Discuss@lists.openscad.org&#39;)">Discuss@...
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Peter Falke
In reply to this post by clothbot
My 2 cents:

just give a new homework that expands on the concept of the last.

Do this 3 times.

After that, the students that did there homework can follow your classwork.

The ones that only copied thier(sp?) work will not understand and be bored.

They wont enroll in the next course and/or fail any exam.

2015-12-02 16:03 GMT+01:00 Andrew Plumb <[hidden email]>:
One other thing I would suggest is to require each student to sign+watermark their work with text() calls and/or simple surface() image maps.



- guarantees every model is at least a little unique
- presents opportunities to include side-bar discussions on intellectual property matters such as copyrights, trademarks, creative commons licensing, etc.
- provides a little incentive to “take ownership” of their models because it’s been personalized.

Andrew.

On Dec 2, 2015, at 9:29 AM, Miro Hrončok <[hidden email]> wrote:

2015-12-02 15:02 GMT+01:00 Parkinbot <[hidden email]>:
An approach like this is highly dependend on the problem.

- How many students do you have?
- How many different solutions will the problem have, if it is solved in a
straight way? How probable is it to find the same solution (i.e. just naming
differences allowed).

Actually, more than a few.

- How many different solutions are syntactically equivalent (e.g. ordering
of union-elements)

That's what I'm not sure.

- How many different solutions are semantically equivalent (is problem
completely characterized?)

It is.

Up to 100 students: I'd recommend to just look at the codes for 10 minutes
(6 secs each). With this you could exctract suspicious solutions for closer
inspection, which would cost you another 20 minutes or so.
I've tried that, but I simply cannot hold that much information in my
head. It's slightly less than 100.

Writing a "usable" software will cost you month!
That's why I just try to use what's already available.

And how would you validate
it?
I would inspect the code manually, after I find similarities in the
CSG tree. Than I would get the students to explain to me why they did
this and that, and to change the module to act differently, etc.

Thanks for all the feedback.

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org

--

"The future is already here.  It's just not very evenly distributed" -- William Gibson

Me: http://clothbot.com/wiki/




_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org



_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

kintel
Administrator
In reply to this post by Miro Hrončok
My Prolog teacher in university managed to do this. She even detected solutions copied&renamed from earlier cohorts.
No idea how, but she did hold a PhD on some AI topic : /

 -Marius

> On Dec 2, 2015, at 08:05 AM, Miro Hrončok <[hidden email]> wrote:
>
> I have several dozens implementations of the same module (some of them
> work as expected, some of them do not). Those were provided by
> students as part of they assignment. Now I'd like to check if someone
> didn't just copy-pasted his friends code and changed the variable
> names etc.
>


_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Felipe Sanches
Same thing happened at my university as well. And there was a "black market" of older students selling the service of programming these exercises for the younger students who were willing to pay for it. Pices were around 30 dollars for each exercise. There were even people charging 3 dollars per grade. As grades went from 0 to 10, if your grade was 8, then you would pay 24 dollars for the service.

People very often asked me if I could offer that kind of service but I never did it, because I knew is was a corrupt/evil system.

There was even people developing tools to automactically generate code variants that supposedly could not be detected by the anti-copy system! I remember that my most frequent though was: "what a waste of educational opportunities... These kids could be learning much more if they were invited into engaging with some sort of collaborative development practice, and corruption in such a collaborative system would be so much harder to endure..."

This was approximately 10 years ago. Nowadays, it seems nothing really changed much, unfortunately. :-(

On Wed, Dec 2, 2015 at 2:00 PM, Marius Kintel <[hidden email]> wrote:
My Prolog teacher in university managed to do this. She even detected solutions copied&renamed from earlier cohorts.
No idea how, but she did hold a PhD on some AI topic : /

 -Marius

> On Dec 2, 2015, at 08:05 AM, Miro Hrončok <[hidden email]> wrote:
>
> I have several dozens implementations of the same module (some of them
> work as expected, some of them do not). Those were provided by
> students as part of they assignment. Now I'd like to check if someone
> didn't just copy-pasted his friends code and changed the variable
> names etc.
>


_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org


_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

johnmdanskin
There are -lots- of tools for detecting academic plagiarism. If you want something automatic, best to use a real tool. Here is a recent list of reviewed tools.
http://www.edudemic.com/the-5-best-plagiarism-detection-tools-for-educators/
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

doug.moen
Cpd is better, though, if you specifically want to compare program structure while ignoring changes to variable names.

On Wednesday, 2 December 2015, johnmdanskin <[hidden email]> wrote:
There are -lots- of tools for detecting academic plagiarism. If you want
something automatic, best to use a real tool. Here is a recent list of
reviewed tools.
http://www.edudemic.com/the-5-best-plagiarism-detection-tools-for-educators/



--
View this message in context: http://forum.openscad.org/Compare-sources-to-detect-plagiarism-tp14890p14907.html
Sent from the OpenSCAD mailing list archive at Nabble.com.

_______________________________________________
OpenSCAD mailing list
<a href="javascript:;" onclick="_e(event, &#39;cvml&#39;, &#39;Discuss@lists.openscad.org&#39;)">Discuss@...
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org



_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Felipe Sanches
In reply to this post by johnmdanskin
There's a huge issue of software freedom on all of the 5 "solutions" suggested in this article. But even more damaging than that is the generalized will to frame copying as a bad thing, by using terms like "plagiarism" to refer to it.

Copying is a reality! It should be incorporated into the educational framework, instead of banished as some sort of crime or blasphemy. Refusing to acknowledge that copying happens and that it will continue to happen as part of our daily digital modern lives, will only propagate (or even further increase) the distancing of educational practices from the realities of our modern world.

These kids should all be writing patches and making pull requests!

On Wed, Dec 2, 2015 at 2:31 PM, johnmdanskin <[hidden email]> wrote:
There are -lots- of tools for detecting academic plagiarism. If you want
something automatic, best to use a real tool. Here is a recent list of
reviewed tools.
http://www.edudemic.com/the-5-best-plagiarism-detection-tools-for-educators/



--
View this message in context: http://forum.openscad.org/Compare-sources-to-detect-plagiarism-tp14890p14907.html
Sent from the OpenSCAD mailing list archive at Nabble.com.

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org


_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Dr Nicholas J Bailey
On Wednesday 02 December 2015 14:41:51 Felipe Sanches wrote:
> These kids should all be writing patches and making pull requests!

Well, I'm a University lecturer, and what you say is a breath of fresh air
Felipe. It is said, "Copying from one person is plagiarism; copying from lots
of people is research". I encourage copying when it promotes understanding (as
in your scenario) but in a commoditized educational environment,
establishments reach for metrics to prove they are getting their assessments
right, and one of those is "similarity" (which is falsely taken to mean a lack
of originality).

I had an excellent PhD student once who got accused of copying by an
antiplagiarism program once. It turned out he was judged to have copied from a
paper which had quoted one previously written himself while working at a
different institution!

We have a research group drinking game now. Every time somebody is falsely
accused of plagiarism, the rest of the group has to buy him or her a drink.
Fortunately it's a small research group...

I'll shut up now because this doesn't really have much to do with OpenSCAD :)

Nick/.




_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
Reply | Threaded
Open this post in threaded view
|

Re: Compare sources to detect plagiarism

Felipe Sanches
Cheers!!! :-D

On Wed, Dec 2, 2015 at 3:05 PM, Dr Nicholas J Bailey <[hidden email]> wrote:
On Wednesday 02 December 2015 14:41:51 Felipe Sanches wrote:
> These kids should all be writing patches and making pull requests!

Well, I'm a University lecturer, and what you say is a breath of fresh air
Felipe. It is said, "Copying from one person is plagiarism; copying from lots
of people is research". I encourage copying when it promotes understanding (as
in your scenario) but in a commoditized educational environment,
establishments reach for metrics to prove they are getting their assessments
right, and one of those is "similarity" (which is falsely taken to mean a lack
of originality).

I had an excellent PhD student once who got accused of copying by an
antiplagiarism program once. It turned out he was judged to have copied from a
paper which had quoted one previously written himself while working at a
different institution!

We have a research group drinking game now. Every time somebody is falsely
accused of plagiarism, the rest of the group has to buy him or her a drink.
Fortunately it's a small research group...

I'll shut up now because this doesn't really have much to do with OpenSCAD :)

Nick/.




_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org


_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
12